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transforms (NTT's), and presents original examples to 
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manner. 

Software and hardware implementation of Fermat number 
transforms are discussed and compared with the Fourier 
Transform showing a substantial improvement in efficiency 
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and word length. Methods and other NTT's, for overcoming 
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I. 



INTRODUCTION 



With the rapid advances in large scale integration, a 
growing number of complex digital signal processing appli- 
cations are becoming economically feasible. In most cases 
the bulk of processing workload appears to consist of 
digital filter computation. Future progress in digital 
signal processing, either towards high speed, real time 
operation or increased sophistication, thus largely depends 
on increased efficiency in digital filtering computation. 
This can be achieved not only by implementing improved 
filter circuits but also by using better computational 
algorithms as will be discussed in this thesis. 

Schonhaje and Strussen [1] (also see text by Knuth 

[2, p.269]) defined Fourier-like transforms over the ring 

2 n 

of integers, modulo the Fermat numbers [2] 2 + 1, to 

yield convolutions. They showed that such convolutions 
can be used to perform fast integer multiplications . Rader 
[3] and Agarwal and Burrus [4] also defined Fourier-like 
transforms over residue classes of integers, modulo the 
Fermat and Mersenne primes , to compute convolutions of the 
real integer sequences. 

For this presentation and exposition of Number Theoretic 
Transforms (NTT) , the works of many different authors were 
consulted. Similar concepts have been studied and compared 
in order to present them in a cohesive and unified manner. 



9 



In section II a mathematical framework for these new 
transforms is presented. This framework is based on the 
theory of congruences, modulo M, which belongs to the general 
area of what is often called "number theory." Number theory 
is very old, going back several thousand years; at least as 
far back as Euclid, who proved some of the original results. 
The names of many other famous mathematicians are also asso- 
ciated with the theory, including Joseph Lagrange (1763- 
1813) , and Leonard Euler (1707-1783) , and Carl Gauss (1777- 
1855) who is responsible for many contributions to the area, 
some of which were published in his book Disguisitiones 
Arithmeticae , published in 1801, when he was 24 [5] . An 
excellent history of the theory of numbers is found in the 
work of Dickson [6] . 

In recent years, there has been increasing interest in 
the practical applications of various parts of number theory, 
including the theory of residue number systems. There has 
been some work on the use of these number systems in general- 
purpose computers [7] -[12], although this line of investiga- 
tion has not yielded many practical results due to the 
difficulty of determining the sign of numbers expressed in 
residue number system notation. More promising results have 
been obtained in applications where sign detection is not 
required, such as number theoretic transforms [3] , [4] 

and [13] . 

The possibility of performing the fast Fourier transform 
(FFT) in finite algebraic systems [14], [15] and [1], is 
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being increasingly discussed as a means of digital filtering 
[3], [13] and [16] (see also references in Reference [13]). 

The convolution property which such transforms share with 
the conventional f.f.t. can be employed to construct a non- 
recursive digital filter in which rounding error does not 
occur. 

The following type of transforms have been discussed in 
the literature. 

(a) Transforms in arithmetic mod p, p prime, in which 
the order (number of points) , d, for example, is 
a factor of p-1. 

(b) Transforms in arithmetic mod m, m arbitrary, in 
which d dividies p-1 for each prime factor p of 

m. 

(c) Transforms in an arbitrary finite Galois field 
GF (p n ) of p n elements, where d divides p n -l . 

Here class (a) is treated as a special case of both (b) and 
(c) , which are essentially different. This material is 
presented in section III which discusses the Fourier trans- 
form in a finite field in a broad way. 

The best known number-theoretic transform is the Fermat- 
number transform [1], [3], [4], [13], [16]. Due to the 

simplicity of its arithmetic, such a transform is the fastest 
method known so far for computing integer convolutions 
under certain conditions. However, this transform suffers 
from the disadvantage that the restriction imposed on the 
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register word length is often too excessive [13] . In 
order to remedy this problem, Rader [3] has suggested using 
a two-dimensional convolution scheme to convolve long one- 
dimensional sequences, and Agarwal and Burrus [17], [18] have 

presented such a two-dimensional convolution scheme. Other 
authors [19] , [20] , [21] , [22] , [23] have also considered 

other number-theoretic transforms (NTT) . Section IV dis- 
cusses the various types of NTT. 

These transforms provide more choices of word length 
and transform length, at speeds which are not attainable by 
the Fermat-number transforms under similar conditions. 
Problems involved in the implementation of Fermat-number 
transforms are discussed in section V. 

Section VI deals with the comparison between the FFT and 
the FNT . Software and hardware requirements for both are 
analyzed and compared. Agarwal [13] programmed Fermat 
number transforms on the IBM 370/155 computer [13] and showed 
how to compute convolutions approximately three times as 
fast as the FFT implementation for the same convolution. 
However, their main drawback is a rigid relationship between 
word length and obtainable transform length, as well as a 
limited choice of possible word lengths. This last point 
is especially significant for FNT's, and may result in a 
waste of computing power when the permissible word lengths 
do not correspond to the dynamic range required for the 
convolution. 
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In principle. Number Theoretic Transforms (NTT) could 
be implemented in the same was as Discrete Fourier Trans- 
forms with multiplications by trigonometric functions 
replaced by multiplications by powers of two, all operations 
being performed modulo a Mersenne or Fermat Number. When 
the transforms have a composite number of terms, as is the 
case with Fermat Number Transforms (FNT) or some pseudo- 
Mersenne Transforms [24] , various pipeline computing tech- 
niques can be used [25] . 

In practice, however, direct transposition of Fast 
Fourier Transforms (FFT) architectures does not necessarily 
lead to optimum implementations and the development of 
special configurations for computing NTT seems worth exploring. 
Along these lines, McClellan [26] has proposed a new coding 
technique for simplifying the implementation of Fermat 
Number Transforms. 

McClellan implemented a FNT convolver for radar signal 
processing and reached some interesting conclusions. 

Leibowitz [27] presents a code translation which is mathe- 
matically simpler, and this proposed arithmetic provides 
simpler realizations of all operations required to compute 
the FNT. 

Nussbaumer [28] discusses the implementation of pseudo- 
Mersenne and Fermat Number Transforms. He shows that some 
pseudo-Mersenne Transforms can be computed efficiently by a 
linear filtering approach. This approach is extended to 
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cover the case of Fermat and pseudo-Fermat Number Transforms 
by using a special coding scheme for implementing arithmetic 
operations in a Fermat Number system. A number of sugges- 
tions have arisen for lengthening the sequences which can 
be handled by the NTT. One suggestion is to perform the 
calculation modulo several mutually prime modulo, and then 
obtain the desired result by using the Chinese Remainder 
theorem [13], [29]. Reed and Truong [30] have also shown 

how one can extend the method to Galois fields over complex 
integers modulo Mersenne primes to enable one to use the 
FFT algorithm to compute convolutions of complex sequences, 
and to lengthen the sequences which the method can handle. 
However, because this method requires several multiplications, 
it does not seem very promising. 

One of the most promising methods for lengthening the 
sequence one can handle has been suggested by Rader [3] 
and developed by Agarwal and Burruo [31] . This consisted of 
mapping the one-dimensional sequences into multidimensional 
sequences and expressing the convolution as a multidimen- 
sional convolution. In Appendix A an explanation of the 
process of two dimensional convolution for convolving long 
sequences is presented. By the use of an example it is 
shown that this process improves the length of the sequences 
handled by the NTT. 

With knowledge of the advantages and disadvantages of 
Fermat Number Transforms, it is possible to speculate on 
just what type of problems are likely to' benefit from this 
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new approach. In general one looks for problems which have 
the following characteristics: 

1) Fairly short sequences (about fifty delayed products) 

2) A high accuracy requirement 

3) Implementations where multiplications are very much 
more costly than additions. 

Two specific situations come to mind. The first is the 
estimation of magnitude spectra of (many simultaneous) 
wideband signals. The theory of power spectra estimation 
states that power density computations involves formulating 
a correlation function with a finite number of delays 
which are a fraction of the number of data points available. 

The second situation is two dimensional finite impulse 
response filtering. Here we may consider an arbitrary LxL 
impulse response to be applied to a large image. If L is 
in the range of 52 points, the FFT is not particularly attrac- 
tive for convolution, although more so for two dimensional 
convolution than for one dimensional convolution. 

The Fermat Number Transform is quite effective, however. 
For impulse responses of this size, the number of multipli- 
cations is reduced by about two orders of magnitude over the 
direct method, in exchange for a number of additions which 
are not too different (usually less) than required for the 
direct method. Thus it can be expected that the Fermat 
number transform will soon play a part in the computation 
required for the filtering of pictures. 
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Recently, Derorae [32] discusses a class of NTT's based 
on three bit primes, having many of the computational 
advantages of the FNT's. These NTT's can have much larger 
transform lengths than those for FNT's so that the fast 
convolution of, for example, a 1000 x 1000 point picture 
with a 24 x 24 point spread function should be possible in 
a minicomputer! This development was undertaken in connec- 
tion with analysis of high resolution electron micrographs. 
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II. MODULAR ARITHMETIC 



Let a and b be integers. We say that "a divides b" 
(denoted by "a|b") and "b is multiple of a" if there exists 
an integer c such that b = ac. 

If a does not divide b, we denote the fact by "a^b". 

An important theorem concerning the division of integers 
(the Division Algorithm), is: 

Let a and b be integers, b not zero. Then there exist 
two unique integers, q and r, such that 

a = bq + r and 0 _< r _< b (2.1) 

The integers q and r, are called the "quotient" and the 
"remainder" respectively. 

If a, b and M, are integers, with M > 0, such that 

M| (a - b) (2.2) 

we say that "a is congruent to b, modulus M" , and we denote 
this fact, by writing 

a = b (mod M) (2.3) 

In other words, two integers a and b are congruent mod M, 
if M divides their difference. 
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We will refer to M as the "modulus". Note that 



1) 23=8 (mod 5) since 23 - 8 = 15 = 5-3 



and 



2) 23 = 3 (mod 5) since 23- 3= 20= 5-4 

are both true statements. 

We restrict our attention to what is called a complete 
residue system, mod M / as the set of integers 

Z M = {0, 1, 2 , ..., M— 1} . (2.4) 

In other words , when an integer a is divided by another M 
i.e. , 



a = KM + b 

where the remainder b/ is some positive integer less than 
M, there exists a congruence 

a = b (mod M) (2.5) 

such that b is a unique integer among the numbers 

0 , 1, 2 , ... M-l . 
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Thus, one possible mechanization of residue reduction, 
is to divide by the modulus and keep only the remainder. 
Examples : 

.47 

1) 47 = 2 (mod 9) since -g- = 5 remainder = 2 

8 1 

2) 81 e 0 (mod 27) since = 3 remainder = 0 

Both 2 and 0 are contained only once within the set of 

integers (0,1,..., 8} and {0,1,2, ..., 26} respectively. 

The last example shows that, in general instead of saying 
that a number a is divisible by the number M we can write 

a = 0 (mod M) 

For this means a - 0 = a = Mk, where k is some integer. 

For instance, instead of saying that a is an even 
number, we can write 

a E 0 (mod 2) 

In the same manner one sees that an odd number satisfies 
a E 1 (mod 2) . 

In working with residue reduction we will drop the 
symbol E for congruence and will use the symbol =. 
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But congruence is not the same as equality unless one 
can show separately that the difference between a and b is 
also less than M. The following basic arithmetic operations 
are permissible with modular arithmetic: 



a) 


Addition: 


2 + 6 = 8 = 1 (mod 7) 




b) 


Negation : 


-2 = -2 + 7 = 5 (mod 7) 




c) 


Subtraction : 


3-5=3+ (-5) = 3 + (-5 + 


7) 






= 5 (mod 7) 




d) 


Multiplication : 


3 x 6 = 18 = 4 (mod 7) 




e) 


Multiplicative 

inverse: 


Multiplicative inverse of an 


integer 



b in Z M exists if and only if b 
and M are relative primes . In 
that case b x b ^ = 1 (mod M) 

2^ = 4 (mod 7); 2x4 = 8 = 1 (mod 7) . 

f) Division: a/b exists if and only if b has 

an inverse. In that case 

a/b = axb” 1 ; 4/2 = 4 x 4 = 16 = 2 (mod 7). 

Note that because of the nature of modular arithmetic, 
numbers do not have sizes or magnitude. One cannot say 
that a particular number is larger than another or that 
numbers are close. 

Extracting the residue is a functional transformation 
of a into b. It occurs often enough in what follows that 
it deserves a special symbol, <•> . 
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b 



( 2 . 6 ) 



The subscript M may be omitted if it is understood from 
the context. 

Computations involving residues are usually simple 
because it is never necessary to work with quantities 
larger than the modulus unless one finds it convenient. 

Notice that 



<x+y> is the same as <<x> + <y>> 

<x-y> is the same as <<x> - <y>> (2.7) 

<xy> is the same as <<x> <y>> 

so that in any computation involving only +, x one may, at 

X 

their option, replace the result of any step by its residue. 
Example: 

<15+13> 7 = <<15> 7 + <13> 7 > 7 = <l+6> 7 = 0 

<12-11> 7 = <<12> 7 - <11> 7 > = <5 - 4> 7 = 1 

<9 x 14> 7 = <<9> 7 x <14> 7 > = <2 x 0> 7 = 0 

Note that if it were necessary to divide for residue 
reduction the operation would be quite costly. 
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Fortunately there are simpler techniques. In the 
simplest case - the residue or a decimal number modulo 10 
is its last decimal digit/ since 



<a> io = < l a i 1q1> io = <a i lol>> (2 - 8) 

i i 

and <a^ 10 1 >^ 0 is zero except for i = 0 term. 

Example : 

3 

<1098> 10 = < l a. 10 i > 10 

i=0 

= <1 x 10 3 + 0 x 10 2 + 9 x 10 1 + 8 x 10°> 10 



<1098> 



10 



8 (mod 10) 



This can be generalized to any radix. If M is a power of 
two, and a is represented on a binary machine, one has a 
trivial method of extracting <a> M * 



<a> 



,K 



- < l H 



< 2 » = 



K-l 



i=0 



a. 2 

l 



(2.9) 



This operation is performed by "masking out" all but the 
K least significant bits. 
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Example : 



2 



2 



<15> 



2 





i=0 



i=0 



< 1111 > 



3 



0 12 
1x2 + 1x2 + 1x2 



7 mod 8 



2 



i.e., here K = 3, and only the 3 least significant bits 
account for the value of the residue. The fourth bit (the 
most significant bit in this case) is "masked out." 

A. SOME IMPORTANT RESULTS IN MODULAR ARITHMETIC 

Euler's function is defined as $(M) , the number of 
integers in the finite set {0,1,2, ... M-l} (which is called 
the set of integers mod M, and denoted by Z^) that are 
relatively prime to M. 

For M a prime. 



Example: let M = 7. The ring of integers mod 7 is 

Zy = {0,1, 2, 3, 4, 5, 6}. Since M = 7, is a prime, the 
integers in Zy that are relatively prime to M = 7, are all 
elements of the set (except zero) , i.e. 



<KM) 



M-l 



( 2 . 10 ) 



<f>(M) = M-l 



<P ( 7) 



7-1 



6 
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If M is composite and its prime factored form is 



denoted by 



r l r 2 3 

M P 1 P 2 p £ 



( 2 . 11 ) 



then the general expression for <J> (M) is, [13] 



<|>(M) = M(1 - ^-) (1 - 



* (1 “ 

P£ 



( 2 . 12 ) 



Example: Let M = 12 = 2 x 3 



Z 12 = {0,1,2,3,4,5,6,7,8,9,10,11} 



by simple counting one finds 4 numbers that satisfy the 
conditions of being relatively prime to M. Checking by 
applying equation (2.12) 



<M12) = 



12(1 - ±) 



(1 - j) 



12 (|) (|) = 



An important theorem known as Euler ' s theorem states 
that for every a relative prime to M 



a <MM) 



1 (mod M) 



(2.13) 



For M prime, this reduces to Fermat's theorem 
a M 1 = 1 (mod M) 



(2.14) 
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which holds for all nonzero elements of Z M , since they 
are all relative prime to M, if M is prime by assumption. 
Example: Let M = 7 



Z ? = {0,1, 2, 3,4,5, 6} 

0(7) = 7-1 = 6 



which holds for all 

, /««x M _. non-zero elements 

or ' = a =1 (mod M) of Z M since they 

are all relative 
primes to M. 

Here : 

l 6 = 1 (mod 7) 

2 6 = 64 = 1 (mod 7) 

3 6 = 729 = 1 (mod 7) 

4 6 = 4096 = 1 (mod 7) 

5 6 = 15625 = 1 (mod 7) 

6~ = 46656 = 1 (mod 7) 



There are certain roots of unity that are of particular 
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interest. If N is the least positive integer such that 

a N = 1 (mod M) (2.15) 

then a is said to be a root of unity of order M, or simply 
of order N . 

If the order of a is equal to <J>(M) , then a is called a 
primitive root. 

If M is prime and a is a primitive root the set of 
integers 



(a K (mod M) , K = 0,1,2, ... M-2} (2.16) 

is the total set of non-zero elements in Z... Thus all 

I-i 

nonzero integers in Z^ can be generated by powers of a 
primitive root. This characterizes the entire field. 

Example : 

Let M = 7 



Z 7 = {0/1/2, 3/4, 5,6} 

Looking in a table of primitive roots, for example [13] 
we get 



3 and 5 are primitive roots of 7 , 



thus 
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The same for 



(3 K (mod 7), K = 0,1,2,. ..,5 = M-2} 



= U 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 } 



= {1,3, 2, 6, 4, 5} mod 7 



and those are all non-zero elements in Z. 



{ 5 K (mod 7), K = 0,1,2,.. .,5} 



= <5° r 5 1 # 5 2 r 5 3 r 5 4 # 5 5 > 



= {1,5, 4, 6, 2, 3} mod 1 



Euler's theorem implies that if a is of order N, then N 
must divide <f> (M) , denoted by 

N | <j> (M) (2.17) 



If M is a prime it can be shown [13] , that roots of 
order N exist if and only if 



N (M-l) 



(2.18) 



and the roots are given by [13] , 



a = a 



M-l/N 



<J> 



(2.19) 



where denotes a primite root. 
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Example: Let M = 7 <j>(7) 



M - 1 



6 



Possible order of roots: N = 1,2, 3,6 

primitive root (from table) : 3 

Order of roots: 




a 



= 36/3 = 3 2 



= 2 mod 7 



a = 3^/6 = 3 mod 1 



a = 2 order 3 a = 3 order (6 

/ 

primitive root 



More generally, if a is a root of order N then [13] 
a K is of order N/K, if K|N 

( 2 . 20 ) 

a is of order N, if N and K are 

relatively primes 



Example : Let M = 11 



4> (M) = 10 



Now a = 2 is a primitive root since 2 



10 



= 1 mod 11 and 



N = 10 is the least positive number such that 

N „N=10 1 j i i 

a = 2 =1 mod 11 



mod 7 
2 

t 

= <f> (7)) 
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Then 



a K = 2 2 = 4 mod 1 is a root of order N 10 / K=2 = 5 

This can be seen as follows: 



N = 



012345678 



,N 



145931459 



9 10 



3 1 



i.e., the root a = 4 generates a cyclic subset of the field 

with N = 5 distinct elements (order 5) . 

K 3 

For a = 2 =8 mod 11 since N = 10 and K = 3 are relatively 
primes a = 8 mod 11 will be a root of order N = 10, or a 
primitive root. 

Checking: 



N = 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


8 N 


1 


8 


9 


6 


4 


10 


3 


2 


5 


7 


1 



Notice that (2.20) implies that the number of roots of 
order N, is given by cp (M) , and therefore the number of 
primitive roots is (j> (<j) (M) ) (since for a primitive root 
N = <f> (M) ) . 

Example: Let M=7 = {0,1, 2,3, 4, 5, 6} 

Number of primitive roots = <j>(<j>(7)) = <f>(6) = 6(1— ^)(1— j) 
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I 



as seen previously, 3 and 5 are primitive root mod 7. 

Number of roots of order N = 3 : <f>(3) = 3 — 1 = 2. 

a = 3 6/3 = 3 2 = 2 mod 7 a = 5 6/3 = 5 2 = 4 mod 7 

a =2 order 3 a = 4 order 3 

So a = 2 mod 7 and a = 4 mod 7 roots of order N = 3 . 

These relations will allow one to calculate all of the 
roots of all possible orders from one primitive root. 

We can summarize these ideas more precisely in the 
following: 

The highest possible exponent to which a number can 
belong mod M is <j> (M) 

(i.e. the highest order of a root is N = <J> (M) ) 

If a number has order N = <J> (M) , we shall call it 
a primitive root for the modulus M. 

Not every modul has primitive roots; for instance, 
for the modulus M = 15, one finds that every x in 
Z^, relatively prime to 15 satisfy the congruence 

x 4 = 1 (mod 15) 

and yet 4> (15) = 8 . 

To find the primitive roots of a modul if they exist, 
one must usually proceed by trial and error, although 
there are certain rules that may facilitate the search. 

Often one of the small number 2, 3, 5 or 6 may turn 
out to be a primitive root. 
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Extensive tables of primitive roots for primes 
have been computed. The first of these, the "Canon 
Arithmeticus" (1839) by K.G.J. Jacobi, included 
primitive roots for all primes below 1,000. More 
recent tables by Kraitchick, Cunningham, and others 
give primitive roots for all primes up to 25,000 
and even beyond. 

One interesting point, not mentioned so far, is the 
existence of multiplicative inverse. 

Multiplicative inverse of an integer b in Z^ exists if 
and only if b and M are relatively primes. 

In that case, b ^ is one integer such that 

b x b = 1 (mod M) (2.21) 

Example: Let M = 7 

5 1 = ? 5x5 ^ = 1 mod 7 5^ = 3 

Since 5 x 3 = 15 = 1 mod 7, i.e., if M is a prime for 
every non-zero integer a, in Z^, there exists an inverse 
[13] 



M-2 

a 



this can be seen by another example. 



(2.21a) 
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Example: M = 7 



1 1 = 1 72 = 1 5 = 1 mod 7 lxl' 1 = 1 mod 7 



2 1 = 2 7 2 = 2 5 = 32 = 4 mod 7 2x2 1 = 2x4 = 8 = 1 mod 7 



3 1 = 3 7 2 = 3 D = 243 = 5 mod 7 3x3 1 = 3x5 = 15 = 1 mod 7 



4 1 = 4 7 2 = 4 5 = 1024 = 2 mod 7 4x4 1 = 4x2 = 8 = 1 mod 7 



5 _1 = 5 7-2 - 5 5 = 3125 = 3 mod 7 5x5 1 = 5x3 = 15 = l mod 7 



6 1 = 6 7 2 = 6 5 = 7776 = 6 mod 7 6 x 6 _1 = 6x6 = 36 = 1 mod 7 



Note that for a non-prime M, a has an inverse given by 
[13]: 



a 



4>(M) -1 



( 2 . 22 ) 



if a and M are relatively primes, 



Example: 



N = 12 = 2 x 3 



4 > ( 12 ) = 12(1 - J ) (1 - |) 



12 (|) (|) 



= 4 



Z M=12 = {0/1/2/3,4,5,6,7,8,9,10,11} 



So there are 0( 12 ) = 4 elements in Z^/ (1,5,7,11) that are 

relatively prime to M = 12. 
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Those a's have inverses given by: 



, .-1 .. <J> (12) -1 .4-1 , 3 . . 

a 1 =l 1 = 1 T =1 =1=1 mod 12 

lx l" 1 = 1x1 = 1 mod 12 = i m od 12 



c e — 1 — d> (12) — 1 _4— 1 c 3 _ , 

a 2 =5 5 = 5 T =5 =5=5 mod 12 

5 x 5 _1 = 5x5 = 1 mod 12 



a 3 = 7 7 -1 = 7 ^ ( 12 ) — 1 = 7 4 " 1 = 7 3 = 7 mod 12 

7 x 7 _1 = 7x7 = 1 mod 12 



a 4 = 11 11 _1 = n<|)(12)-l = u 4-l = u 3 = n mod 12 



llxll " 1 = 11 x 11 = 1 mod 12 



This suggests that in this case b ^ = b. Therefore, we 
verify that by considering M a composite rather than a 
prime, one observes several differences. 

If we define Z M as the ring of integers modulo M, 
then if in a ring of integers multiplicative inverses exist 
for all non-zero integers, this ring is called a field. 

A field with a finite number of elements is also 
called a Galois field. It is clear then, that Z„ is a 
field if and only if M is a prime. 

Z^ is not a field for a non prime M, since not all 

elements will have multiplicative inverses. Also, there is 

/ 

no primitive root that will generate the entire ring, (only 
subsets with 4 > (M) elements, at most.) 
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Example : 



M = 8 = 2' 



<MM) = 8(1 - |) » 



N 



0 



1 



_N 



1 



3 N 

4 N 



1 

1 



5 N 

6 N 

7 N 



1 

1 

1 



1 2 

1 1 

2 4 

3 1 

4 0 

5 1 

6 4 

7 1 



3 4 
1 1 
0 0 
3 1 
0 0 
5 1 
0 0 
7 1 



5 6 7 
111 
0 0 0 
3 13 
0 0 0 
5 15 
0 0 0 
7 17 



4 



subset with 4 elements 
root of order N = 2 
subset with 3 elements 
root of order N = 2 
subset with 4 elements 
root of order 2 



In these conditions, the previous example shows, that 
there are multiplicative inverses, only for those elements 
in Zg relative primes to M = 8. 

We can investigate, a little more, the arithmetic 
modulo M, when M is not a prime. Let M have the following 
unique prime power factorization (2.11) 




when the arithmetic is done mod M, it is in effect done 

r . 

modulo each prime power 1 simultaneously [13] . 

A set of arithmetic operations can be done either 
r . 

modulo each p^ separately and the final result mod M 
obtained using the Chinese remainder theorem [13] , or 
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alternatively all the operations may be done mod M, but 

r i 

they must be valid operations mod for each p^ 

In these conditions, an integer a is said to be of 
order N in if and only if it is of order N in each 




Here we present some basic results. 



a = b mod M 



is true if and only if 

r . 

a = b mod p^ 1 i = l,2,...5, (2.23) 

r i 

If we know the residues of an integer a modulo each p^ 

We can uniquely reconstruct the integer a mod M using the 
Chinese Remainder Theorem. 

To establish this theorem, let 



a 



a i 



mod p 



d. 

1 



A 



r . 

M/(p ± 1 ) 



and 




r . 



(d i 



mod p . 1 ) ^ mod p 



r . 

l 



(2.24) 



(2.25) 



(2.26) 



35 



then 



P 

a = ( T d. d." 1 a.) mod M (2.27) 

L 1 1 i 

i=l 

Example: 

Let a = 123 and M = 24 = 2^x3 

Calculation of a^,s: 

3 

a = 123 = a^ mod 2 a^ = 3 mod 8 

a = 123 = a 2 mod 3 a^ = 0 mod 3 

Calculation of d^,s: 

r l 3 

d ± = M/p ± = 24/2 = 3 mod 8 

r 2 

d 2 = M/p 2 = 24/3 = 8 = 2 mod 3 

-1 

Calculation of d^'s: 

, -1 A , r l,-l , 0 ,-l _4> (8) -1 0 3 , , 0 

d^ = (d^ mod p^ ) = (3 mod 8) =3 =3 =3 mod 8 

, -1 A ,, , r 2,-l /0 , _.-l _M02 _3-2 „ , , 

^2 = ^2 m0< ^ ^2 = (2 mod 3) =2 =2 =2 mod 3 

then 
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a 



2 

= ( l d i d i 1 a i> mod M 

i=l 

a = (3x3x3) + (2x2x0) = 27 = 3 mod 24 

Check : 

a = 123 = 3 mod 24. 
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III. TRANSFORMS IN FINITE FIELDS 



The basic operations of signal filtering is convolution. 
In the discrete time signal processing situation, convolution 
takes the form 



y(n) = x(n) * h(n) = £ x (m) h (n-m) (3.1) 

n=-co 

n ■* 0 , 1, 2 , ... 

Where h(n) is the response to a unit impulse, for a causal 
filter, then 



h (n) = 0 for n < 0 

and, when the duration of the impulse response is finite, 
the infinite sum (3.1) reduces to a finite sum 

N-l 

y(n) = £ x(m) h(n-m) (3.2) 

m=0 

where N is the length of the finite impulse response 
(FIR) filter. 

The processing workload required to evaluate (3.2) 
can be reduced significantly if direct computation is 
replaced by transforms methods. This is so provided the 
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application in question allows sequences to be processed 
in blocks. 

The Discrete-Fourier Transform (DFT) is one of the most 
versatile transforms, and 

DFT = X(K) = 



and its inverse transform 
I DFT = x (n) = 



where N is the length of the sequence which transforms one 
wants to calculate. 

In order to use the DFT in the high-speed implementation 
of convolution, one makes use of its cyclic convolution 
property, which states that 

DFT [h (n) * x (n) J = DFT [h (n) ] • DFT [x (n) ] (3.5) 

and this implies that convolution can be implemented using 



is defined by 



N-l 
l x(n) e 
n=0 



• #2 TT . _ _ 

-3 (ij-)nK 



(3.3) 



K = 0, 1, ... N-l 



by 



N-l . ,2tt. 

1 r 3(“N-)nK 

i I X(K) e 
K=0 

m = 0 , 1, ... N-l 



(3.4) 
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IDFT{DFT[h(n) ] • DFT [ x ( n ) ] } 



(3.6) 



y (n) = 

The convolution implemented by (3.6) is called cyclic convo- 
lution since it evaluates (3.2) as if h(n) and x(n) were 
periodically extended outside the range [0,N-1], or equiva- 
lently, the indices were evaluated modulo N. Notice that 
normal finite convolution can be calculated by cyclic 
convolution if zeros are apended to x(n) and h(n) to prevent 
folding or aliasing [34] . 

The DFT is a transform of finite sequences, of real or 
complex numbers. One might ask if there are transforms in 
other number fields. That is, given a sequence of numbers 
modulo M, is there a transform that has the cyclic convolu- 
tion property?- One will see that the answer to this question 
is yes. 

If one has a sequence of numbers of length N, then a 
transform of the form given by the pair 

N-l 

X(K) = l x (n) a nK (3.7) 

n=0 

N-l 

x (n) = jjj- l X (K) a nK (3.7a) 

K=0 

is said to have a DFT structure [34] . 
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If one looks for the properties that a general trans- 
form (3.1) having the DFT structure must have to exhibit 
the cyclic convolution property, one finds [4] , [35] , that 

it depends on the existence of an a that is a root of unity 
of order N, i.e. , 



N 

a = 1 (3.8) 

and that N ^ ^ exists, i.e., the inverse of N exists. 

It has been shown [35] that i^the complex number field the 
. • — ^ n 

conventional DFT with a = e is the only transform, with 

that property. However, by working in a finite field or 
ring of integers with arithmetic carried out an integer 
M, a large class of transforms exist [14] that have the 
cyclic convolution property. By special choices of the 
length N, the modulo M, and the value a, it is possible to 
have transforms with many interesting properties. These 
are called number theoretic transforms. 

The possibilities of interest are, [14]: 

1) the ring of integers Z^, with respect to modulo M 

2) the field of integers GF(p), with respect to prime 
modulus p 

3) the Galois field GF(p^) of p^ elements. 

Let represent the ring of integers {0,1 ... M-l} (2.4) 
with arithmetic carried out mod M. Let M have the following 
unique prime power factorization: 
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M 



( 2 . 11 ) 





r 



P£ 



£ 



where the ' s are distinct primes. 

As pointed out previously (section II) , when one carries 

arithmetic mod M, one is in effect doing it modulo each 
r . 

p^ 1 simultaneously. Therefore, the length N number theoretic 
transform, having the cyclic convolution property in Z^, 
must also have that property in 



-i 1 



for i = 1, 2, ... £. 



This requires that a (mod p^ 1 ) , an integer of order N 
must exist in , i.e., N is the least positive integer 



such that 



Pi 



a N = 1 (mod p^i), i = 1,2, ... £ (3.9) 



Example 

then 



Let M = 24 = 2 x 3 



Z 2 — — (0,1, 2, 3, 4, 5, 6, 7} 



and 



Z 3 = {0,1,2} 
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For (3.9) to be satisfied, since 
i) for Z 3 



N 


0 


1 2 




(3.10) 


i N 


1 


1 1 






2 N 


1 

^ 


2 1 


, i.e. , a = 1 


order N 






order 2 


a = 2 


order N 



and 



ii) for Z, 



N 

1 

2 



.11111111 
124 0 0000 



(3.11) 



i.e., a = 1 order N = 1 



Then, a = 1 (mod 24) is a root 

length of the number theoretic 

ring Z 24 , is N = 1 (not a very 

Furthermore, since the inverse 

existence of the inverse of N, 

exist in Zj, , or N should be 
Pi 1 

In this example, 

(i) for Z^ 



of order N = 1, thus the 
transform possible in the 
interesting result!), 
transform requires the 
i.e., N 1, this number should 
relatively prime to M (2.21). 
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one wants N 



? 



-1 




By (2.21a) 



.."1 ,“1 i 3-2 ,1 , , 

N =1 =1 =1=1 mod 3 



(ii) for Z. 



also required isN^=l^=? 



By (2.22) 





(mod 8 ) 



1 4 >( M )-1 



but 



<j) (M= 8 ) = 8(1 - |) = 4 



then 



N 1 = 1 1 (mod 8 ) = l 4 1 = l 3 = 1 (mod 8 ) 

Thus we have verified the existence of a number theoretic 
transform of length N = 1, in the ring of integers Z ^ 4 * 

In general, the existence of an of order N, each Z jfi 
can be investigated recalling Euler's theorem (2.13). 

That is, by observing (3.4) and Euler's theorem, one has 
the condition 
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l — X f 2 , « « « , 



(3.12) 



N|4>(P i 1 ) 

r . 

i.e. , N should divide 4> (p^ )• 

Example: Let M = 11 x 31 = 341 

Then 

— { 0 / 1 / 2 / . • • , 10 } 

and 



2^ {0,1,2, . . . , 30} 

i) for Z^ 

cf) ( 11 ) = 11-1 = 10 , implies possible values of 

order N for the roots in 
Z u (1,2,5,10) 

ii) for 

<}) (31) -= 31-1 = 30 , implies possible values of 

order N for the roots in 
Z 31 (1,2,3,5,6,10,30) 

That is, for (3.12) to be satisfied, in the conditions given 
by i) and ii) , the possible values for the length of the 
NTT in Z 341 are (1,2,5,10). 
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Notice that by (3.9), whatever the root in consideration. 



it has to be the same order in as that in 

For Z one constructs the following table. 




root of order 1 
10 
5 
5 
5 

10 
10 
10 
5 

root of order 5 



(3.12a) 



For Z^, a similar table is constructed, from which only 
a part is shown (notice that only N = 1, 2, 5, 10 are of 
interest) . 



N 


0 


1 


2 


5 


10 










i N 


1 


1 


1 


1 


1 


root 


of 


order 


1 


2 N 


1 


2 


4 


1 


1 


root 


of 


order 


5 


3 N 


1 


3 


9 


26 


25 


root 


of 


order 


30 


4 N 


1 


4 


16 


1 


1 


root 


of 


order 


5 


5 N 


1 


5 


25 














io N 


1 


30 


1 


30 




root 


of 


order 


2 



Thus the root a = 4 is of order 5 in both Z^ and Z 31 , 
and so the length of the NTT in the ring Z 34 can be equal 
to 5. Furthermore, since the multiplicative inverse of an 
integer b in Z^ exists if and only if b and M are relatively 
prime , and for the inverse transform one requires N ^ , N 
should be relatively prime to M (or p^'s). 

In the last example: 

N = 5 is relative prime to N = 243 so 5 1 mod 341 
exists and is given by 

. 5 *<341)-1 mQd 341 



but 



<f> (341) = 341(1 - yy) (1 - yy) = 300 



SO 



5 1 mod 341 = 5 300 " 1 mod 341 = 5 299 mod 341 



To find this number notice that 

299 = 256 + 32 + 8 + 2 + 1 = 2 8 + 2 5 + 2 3 + 2 1 + 2° 



so 



.299 



= 5 



256 c 32 -8 
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But 



Ul 

to 

!! 


25 mod 341 


5 4 = 


284 mod 341 


5 8 = 


180 mod 341 


5 16 = 


5 mod 341 


5 32 = 


25 mod 341 


5 64 = 


284 mod 341 


5 128 


= 180 mod 341 


c 256 

D 


= 5 mod 341 



Then 

5 2 " mod 341 = (5 • 25 • 180 • 25 * 5) mod 341 
= 273 mod 341 



Check : 



(5 x 273) mod 341 = 1 mod 341. 



So 

5 _1 mod 341 = 273 mod 341. 
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That is, it is verified that a number theoretic transform 
of length N = 5 exists in Now ' the condition N 

relative prime to M (or p^'s) means 



N| ( Pi - 1) , i = 1, 2 ,...,£ (3.13) 



i • e • 



N | gcd {p^l, P 2 -1 / — / • 



0(M) is defined as the greatest common divisor (gcd) 
of the (p^ - 1) 

0 (M) = gcd {p 1 ~l, P 2 -1 / # P^-l) • (3.14) 

Therefore 



N|0(M) 



(3.15) 



This last equation gives the necessary condition for the 
existence of a transform of length N in the ring Z N with 
arithmetic carried out an integer M. 



Example: Let M = 11 x 31 = 341 

0 (M) = gcd {11-1, 31-1} = gcd{10,30} = 10 
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Notive that N = 5 satisfies the condition (3.15) since 
N = 5 1 (0(M) = 10.) 

It remains to be investigated further whether or not a 

given a of order N = 10 exists in both and Z^. 

Now, consider the converse of condition (3.15). If 

r . r . 

N|0(M) or N|4>(Pj_ 1 ) , then there exists integers ou (mod 1 ) 

or oder N in Z r [13] . 

P r . 
Using these a^'s one can construct transform (mod p^ x ) 

which have the DFT structure (3.7) and are invertible. 

Combining these transforms by the Chinese remainder 

theorem (2.27), one can obtain a transform (mod M) having 

the cyclic convolution property in Z M - Alternatively one 

can combine the cu ' s by the Chinese remainder theorem to 

obtain an a (mod M) of order N in Z^ and construct the 

final transform using this a. The results will be identical. 

Example: Let M = 5 x 17 = 85 



<j) (5 ) = 5-1 = 4 possible values of N: 1,2,4 



<p (17) = 17-1 = 16 possible values of N: 1,2,4,8,16 



If one looks for an a of order 4 in Zg^., by constructing 
tables for Z 5 and Z^ as those in (3.12 a,b) , one finds 
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2 mod 5 (order 4, in Z^) 



a 2 = 4 mod 17 (order 4, in Z ^_) 

Using the Chinese remainder theorem (2.27): 

(i) d^ ' s calculation 

d^ = (5xl7)/5 = 17 mod 5 = 2 mod 5 

d 2 = (5 x 17) /17 = 5 mod 17 

-1 

(ii) Calculation of d. 's. 

l 

d x = (2 mod 5)" 1 = 2 5-2 mod 5 = 2 3 = 3 mod 5 (by 2.21a) 

/ 

d 2 -1 = (5 mod 17) _1 = 5 M-2 = 5 17-2 = 5 15 = 7 mod 17 (by 2.21a) 

Then (by (2.27) 

a = ((2x2x3) + (4x5x7)) mod 8 5 

a = (12 + 140) mod 85 

a = 152 mod 85 = 67 mod 85 

Check: 

67^ = mod 85 
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Notice that a = 64 mod 5 is of order 4 in Z^, and also 
a = 64 mod 17 is of order 4 in 

To establish the existence of a NTT in Zg,., with length 
N = 4, one has to find the inverse of N = 4 ^ , i.e.. 



By (2.22) 



1 . <j> (85) -1 , . /oc\ 

\ = 4 T and <J> ( 85 ) = 



85(1 - |) (1 - Yf) 



= 64 



So 



.-1 



= 4 



64-1 



= 4^ 2 m0( j 85. 



Since 



63 = 32+16+8+4+2+1 



4 63 



.32 .16 

4 • 4 




• 4 



1 



But 

4'*' = 4 mod 85 

4 2 = 16 mod 85 

4 4 = 1 mod 85 
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1 mod 85 



4^ = 1 mod 85 
4 32 = 1 mod 85 

SO 

4 _1 = 4 63 = 1 * 1 * 1 ' 1 • 16 • 4 = 64 mod 85 



Check: 



4 x 64 = 1 mod 85. 

The fact that condition (3.15) holds as well as its 
converse [13] means that (3.15) is the necessary and 
sufficient condition for the existence of an invertible 
transform of length N which has the cyclic convolution 
property, mod M. 

This also establishes that the maximum transform length 

in Z., is 
M 



N v = 0 (M) (3.16) 

max 

Notice that for a given modulus one knows exactly what are 
the possible transform lengths in Z^. 

For any NTT to be computationally efficient, there are 
three main requirements [17J : 
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(i) N, the transform length, should be highly composite 
(preferably a power of 2) for an FFT-type algorithm 
to exist, and should be large enough for practical 
sequence lengths. 

(ii) The multiplication by powers of a (3.7) must be a 

a simple operation. This is possible if the powers 
of a have a binary representation with few bits . 

(iii) To simplify the arithmetic modulo M, M should have 
a binary representation with very few bits and 
be large enough to prevent overflow. 

Although the class of all possible number theoretic trans- 
forms seems very large at first consideration, closer 
examination shows that very few seem to satisfy the afore- 
mentioned criteria. The parameters that must be chosen are 
M, N and a. 

Briefly, one verifies that if M is even, it has a factor 

of 2 and, therefore, 0(M) and N are 1, which implies 

max ^ 

M should be odd. If M is prime then 0(M) = M-l which is as 
large as one could hope for in a field of M integers. 

In section IV, the various types of NTT are discussed, 
but one can say that conditions ( (3 . 8) , (3 . 16) ) do not give 
a systematic way of determining the "best" choices. As a 
result one must use intuition, insight and a bit of searching. 
Usually M is selected and the resulting possible N and a 
are then examined. 
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IV. NUMBER THEORETIC TRANSFORMS 



One computational need in the digital processing of 
signals is the evaluation of the circular convolution 
summation of two sequences of length N. That is, the 
evaluation of 

N-l 

y(n) = £ x(n-m) h (m) (4.1) 

m=0 

n = 0,1, ... N-l 

The so-called fast convolution procedure obtains this 
sum by taking the inverse transform- of the product of the 
transforms of the two sequences. If the transform used is 
the discrete Fourier transform, then error-free results 
are obtained only if infinite precision arithmetic is 
assumed. This is true, even if both sequences are composed 
of finite precision numbers because the Fourier transform 
involves the irrational number exp [- j (2 tt/N) ] . 

One way to avoid the round-off errors induced by the 
transform is to make use of the fast transforms over the 
finite field [14J . 

By working in a finite field or ring of integers with 
arithmetic carried out modulo an integer M, a large class 
of transforms exist that have the cyclic convolution property 
(3.5). By special choices of the length N, the mod M, and 
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the value of a, it is possible to have transforms that need 
only word shifts and additions but no multiplications, that 
have an FFT type fast algorithm, that do not require 
storage of complex values of a, and that have no roundoff 
errors. These transforms are the Number Theoretic Trans- 
forms (NTT) and they look very promising in the evaluation 
of finite convolutions. Their main disadvantage seems to 
be a relation of the sequence length N to the required word 
length that can require long word lengths for long sequences. 

This section begins by discussing Mersenne and Fermat 
number transforms, that proceeds historically all subse- 
quent work in this field. In part B, other number theoretic 
transforms that provide more choices of word lengths and 
transform lengths than Mersenne and Fermat number transforms 
are discussed. 

A. MERSENNE AND FERMAT NUMBER TRANSFORMS 

Rader [3] suggests performing the calculations of a 
transform with the DFT structure (3.7), in the ring of 
integers modulo a Mersenne number. Such numbers are defined 

\\ 

by [3] 

p = 2 q - 1 (4.2) 

where q is prime, but p is not necessarily prime. 

CT th 

In the ring p = 2--1, a = 2 is a root of the q order 

since 
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2 q 



(2 q - 2 q + 1) 



1 mod 2 q -l (4.3) 



Under these conditions, a Mersenne transform of an 
integer sequence (a n ) having q terms is defined by [3] 



q -l 

A k = ( l a n 2 nK ) mod p (4.4) 

n=0 

K = 0 , 1 ... q . 

Because q has an inverse modulo p [3] , the inverse Mersenne 
transform will be 



q-l 

a m = (q 1 l A k 2 mK ) mod p (4.5) 

K=0 

m = 0,1 . . . q-l 

where all exponents and indices being taken modulo q and 
all operations being performed modulo p in both (4.4) and 
(4.5) . 

It can be demonstrated [3] that the Mersenne transform 

satisfies the convolution theorem; that is to say, if 

{X } is the Mersenne transform of {x }, then with Z. = A -X^ mod p, 
i\ n K a a 

the Inverse Mersenne transform (Z m ) of {Z K } is given by 

q-l 

Z m = (I a x ) mod p (4.6) 

m u n m- n 

n=0 
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If (a n ) and {x n > are properly bounded [3] , is equal 
to the output of the ordinary cyclic convolution with 



Z 



in 



q-1 



l 

n=0 



a 

n 



x 



m-n 



(4.7) 



Under these conditions, digital filtering of real integer 
sequences can be performed by dividing the sequences into 
blocks, padding the blocks with zeros [34] to prevent folding, 
and aliasing and computing the cyclic convolutions by means 
of Mersenne transforms. 

The number of transform terms can be extended to 2q, 
since a = -2 mod p is a root of order 2q.: 

a 2q = _ 2 2q = (_ 2 q ) 2 = (-2 q + 2 q - l) 2 = (-1) 2 = 1 mod 2q-l 

(4.8) 

Example : Let q = 7 

Then 



P = 





127 



and 



a = -2 mod 127 = (-2 + 127) = 125 mod 127. 



Notice that 2q = 14, so 
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(- 2 ) 



2q 14 

a M = a 



14 



= 125 



14 



mod 127 



Now, 



14 = 



3 2 1 

2 + 2 + 2 



= 8 + 4 + 2 



Thus 



but 



and 



125 14 = 125 8 • 125 4 • 125 2 



125 2 = 4 mod 127 

125 4 = 16 mod 127 

125 8 = 2 mod 127 



125 



14 



125 



14 



2*16-4 mod 127 
128 mod 127 
1 mod 127. 



Notice also that the inverse of 2q = 14, mod 127 is 



-1 

a 



-1 127*2 125 

14 mod 127 = 14 = 14 mod 127 (by 



.21a) . 
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By performing some simple calculations, 



14 -1 = 118 mod 127. Check: 14 x 14 _1 = 14 x 118 = 1 mod 127. 

Thus, a Mersenne transform exists which has the cyclic 
convolution property for sequences of length N = 2q, with 
a = r2 replacing exp[-2irj/N] as the Nth primitive root of 
unity in (3.3), and with all calculations in (3.7) done in 
arithmetic modulo p. 

Rader advocated such a transform since using a = 2 
or a = -2 as a root of unity would necessitate only shift 
and add operations in computing the transform (3.7). 

Because Mersenne transforms are evaluated without 
multiplications, computation of a time-invariant circular 
convolution (4.1) having N = q points reduces to one multi- 
plication per output sample, as opposed to q multiplications 
with direct calculation. To compute a FFT of a sequence 
of length N = q requires of the order of (N/2) log 2 (N/2) 
complex multiplications [17] . 

The main limitations of Mersenne transform approach are 
related to the fact that the number of transforms terms 
q (or 2q) is not highly composite, since q is a prime. This 
means that calculations of the transforms cannot be simpli- 
fied by an FFT- type algorithm. 

K K 

If one considers p = 2 +1 and K odd, 3 divides (2 +1) 

and the largest possible transform is 2, thus one considers 
only K even. 
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t K 

Let K = s 2 , s odd, t an integer. Then since p = 2 + 1, 



one has 



2 s2 + 1 



2+1 



n / n an integer. (4.9) 



And the length of possible transforms will be governed by 
the length possible for 2 +1 (see, 3.15). Therefore 

integers of the form 



2 

p = 2 + 1 (4.11) 

t = 0/1/2/3... 

are of interest. These numbers are called Fermat numbers and 
are defined by F^ = p in (4.11). For t=0 to t=4 the Fermat numbers 
are prime. For t > 4 there are no known Fermat number prime. 

Number theoretic transforms with a Fermat number as 
the modulus, are called Fermat number transforms (FNT) . 

By (3.15), for the FNT of length N to exist N must 
divide 0 (F t = p) . 

Notice that (by 3.16), for F fc prime 

N = 0 (F ) = 2 b , b = 2 t (4.12) 

max t 

and one can have FNT, for any length 

N = 2 m , m < b (4.13) 
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Example 



Let p 



17 



= F 2 = 2 2 +i = 2 4 + 1 = 

If one constructs the following table: 







N 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


order 


1 


i N 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


order 


8 


2 N 


1 


2 


4 


8 


16 


15 


13 


9 


1 


2 


4 


8 


16 


15 


13 


9 


1 


order 


16 


3 N 


1 


3 


9 


10 


13 


5 


15 


11 


16 


14 


8 


7 


4 


12 


2 


6 


1 


order 


4 


4 N 


1 


4 


16 


13 


1 


4 


16 


13 


1 


4 


16 


13 


1 


4 


16 


13 


1 


order 


16 


5 N 


1 


5 


8 


6 


13 


14 


2 


10 


16 


12 


9 


11 


4 


3 


15 


7 


1 


order 


16 


6 N 


1 


6 


2 


12 


4 


7 


8 


14 


16 


11 


15 


5 


13 


10 


9 


3 


1 


order 


16 


7 N 


1 


7 


15 


3 


4 


11 


9 


12 


16 


10 


2 


14 


13 


6 


8 


5 


1 


order 


8 


8 N 


1 


8 


12 


5 


16 


9 


4 


15 


1 


8 


12 


5 


16 


9 


4 


15 


1 


order 


8 


9 n 


1 


9 


3 


15 


16 


8 


4 


2 


1 


9 


3 


15 


16 


8 


4 


2 


1 


order 


16 


10 N 


1 


10 


15 


14 


4 


6 


9 


5 


16 


7 


2 


3 


13 


11 


8 


12 


1 


order 


16 


11 N 


1 


11 


2 


5 


4 


10 


8 


3 


16 


6 


15 


12 


13 


7 


9 


14 


1 


order 


16 


12 N 


1 


12 


8 


11 


13 


3 


2 


7 


16 


5 


9 


6 


4 


14 


15 


10 


1 


order 


4 


13 N 


1 


13 


16 


4 


1 


13 


16 


4 


1 


13 


16 


4 


1 


13 


16 


4 


1 


order 


16 


14 N 


1 


14 


9 


7 


13 


12 


15 


6 


16 


3 


8 


10 


4 


5 


2 


11 


1 


order 


8 


15 N 


1 


15 


4 


9 


16 


2 


13 


8 


1 


15 


4 


9 


16 


2 


13 


8 


1 


order 


2 


16 N 


1 


16 


1 


16 


1 


16 


1 


16 


1 


16 


1 


16 


1 


16 


1 


16 


1 



(4.13a) 
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one sees that (3,5,6,7,10,11,12,14) are primitive roots 

that will generate the entire field Z The root a = 2 

is of order 8 and a 2 = 2 2 = 4 is of order 8/2 = 4 (by 2.20) . 

2 

Also note that 11 = /2, in the sense that 11 =2 mod 17 . 

That is, for the ring one has the possibility of 

choosing a FNT of lengths (1,2,4,8,16) thus satisfying 
(4.12) and (4.13). 

For digital filtering applications the composites 
F 5 (b = 32) and Fg (b = 64) seem to be practical [4] . 

Lucas [6] has proven that every prime factor of a composite 
F t , is of the form 

K 2 t+2 + 1 (4.14) 

t+2 

Therefore, 2 divides 0 (F fc ) , for t > 4. 

2^ 

Example: Consider Fg = 2 +1=4 294 967 297 

= 641 x 6 700 417 

0(F 5 ) = g.c.d.{ (641-1) , (.6700 517-1)} 



by 



0(F 5 ) = g. c.d. { 640 , 6 700 416} 



640 = 2 7 • 5 



6 700 416 = 2 7 • 3 * 17449 
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Then 



0 (F c ) = 2 7 = 128 = 2 5+2 where t = 5. 

D 



Therefore, for the choice of Fermat numbers with t > 4, 
the maximum possible transform length is given by (see, 3.16) 

N m=v = 0 (F.) = 2 t+2 = 2 2 * 2 fc = 4b (4.15) 

max t 



where 

t > 4, b = 2 b . 
Agarwal [4] proved that 



a 



2 b /4 (2 b/2 . u 



(4.16) 



is a root of order 4b, in Z n , t > 2. 

F t 

Notice that 

ct 2 = [2 b/4 (2 b/2 - l)] 2 = 2 b/2 (2 b/2 - l) 2 

o 2 = 2 b /2 (2 b - 2-2 b/2 +1) = 2 b/2 (-2 • 2 b/2 ) 

a 2 = (-2) 2 b 



Thus 
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a 2 = 2 mod 2 b + 1. 

2 

Since a = 2 mod F^, the notation a = /2 for (4.16) 

t 

will be adopted in this thesis following a general procedure. 

Also, note that any odd power of /2 will also be of 
order 2 t+2 (by 2.20), i.e. 

a d = , d odd (4.17) 

t+2 . r~ 

is a root of order 2 . And raising a = /2 (4.16) to 

,t + 2-m) m 
2 1 ' power one obtains an integer a ' of order 2 , 

m < t+2, i.e. 



9 ( t+2-m) 

a = "a ' (4.18) 

a' being a root of order 2 m , m <_ t+2. 

2 2 4 

Example: Let p = = 2 +1 = 2 +1 = 17 

i.e. 



b = 





4 



Then 



a = 2 b/4 (2 b/2 -l) 



2 4 / 4 ( 2 4 / 2 -!) 



2(2 2 -l) 



a = 2(3) = 6 mod 17 
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is a root of order 4b = 4*4 = 16. Observing (4.13a) one 
sees that this result is correct. 

Notice also in (4.13a) that 6 = /2 in the sense that 
6 2 = 2 mod 17. So 



a = /2 = 6 mod 17. 

If one raises this a to 2 ^ t+2 , m = 3 <_ t+2 = 4 

i.e. 



2 ( 2+2-3) = 2 

a 2 = 6 2 = 2 mod 17 

and a = 2 is a root of order 2 m = 2 2 = 8 (4.18) 

Observing (4.13a) one verifies that this is also a 
correct result. 

If one raises a = 6 to an odd power say d = 5, 

5 5 

a = 6 =7 mod 17 is a root of the same order as a = 6, 

namely of order 4b = 16, verifying (4.17). Observing (4.13a) 
one verifies that this is correct. 

Notice that a = 2 mod (2 D +1) is of order N = 2b, since 

2 2b = 2 2 - 2t = 2 2t+1 



but 
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(4.19) 



2 2b = (2 b ) 2 = (-1) 2 = 1 mod (2 b + 1) 

Thus for FNT's with a prime or composite modulus one sees 
that a = 2 (or a power of 2) is a possible root of order 
N = 2b = 2 • 2 b = 2 t+1 . 

This means that sequences up to a length N = 2b, in 
this case, can make use of FNT. 

This is a very desirable situation, since N = 2 t+ ^ 
is highly composite allowing an FFT type algorithm and all 
multiplication by powers of a are simple word shifts. 

2 

if a = /2 is used then sequences of length N = 4b = 2 

are possible (4.16). Recalling that Fermat numbers up to 

F^ are prime 0(F(t)) = 2 a (by 3.14), and in these cases one 

can have an FNT for any length N = 2 m , m _< b. 

Notice that, for these Fermat numbers, a = 3, is a 

root of order N = 2 (see 4.13a), allowing the largest 

possible length, in the correspondnet ring Z . 

F t 

The following table shows some parameters for several 
possible implementations for FNT's: 





t 


b = 2 t 


F t = 2 b + l 


N = 2b 
(a = 2) 


N = 4b 
(a = J2) 


N 

max 


a for 
N 

max 



2 


4 


2 4 + 1 


8 


16 


16 


3,5, 6,7, 
10,11,12,14 


3 


8 


2 8 + 1 


16 


32 


256=2 b 


3 


4 


16 


2 16 + 1 


32 


64 


65536=2 b 


3 


5 


32 


2 32 + 1 


64 


128 


128 




6 


64 


2 64 + 1 


128 


256 


256 
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That is, a = /2 and the resulting N = 4b give the maximum 
length possible for F,. and Fg. However, for prime F t , 
further increases in N are possible up to N = 2 if more 
stages of the FFT algorithm are allowed to have multiplica- 
tions rather than simple word shifts. 

Notice also that besides a = 3 being of order N = 2 

b” 1 b 

there are 2 - 1 other integers of order 2 , since: 

<j>(M) = (2 b + 1) - 1 = 2 b , 

and the number of primitive roots in Z p is given by (see 

t 

section II) 



<}>(<}) (M)) = 2 b (l - -|) = 2 b 1 . 

For F 2 = IV, one sees that the number of primitive 
roots is 



2 4 " 1 = 8 (3,5,6,7,10,11,12,14) . 



The cyclic nature of modular arithmetic means that, with- 
out a priori knowledge, integers cannot be associated with 
magnitude. For example, the days of the week represent a 
modulo 7 system, so that the statement "Friday is after 
Wednesday" has no meaning unless the week for the Friday 
and Wednesday in question is specified. 
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This means that any number theoretic transform unlike 



the DFT has no physical significance like "frequency." 

It is merely a transform space, the halfway stage in convolu- 
tion. 

During various stages of the computation of an NTT, 
each accumulation of signal "overflows" many times. 

But still the end result of the convolution will be 
exact if the input signals are properly bounded [17] . That 
is, a dynamic range constraint is imposed by the modular 
arithmetic. One must be able to bound a priori the result 
of convolution in order to determine the true answer from 
the answer modulo F . 

The worst case bound is determined by the following 
procedure. If 



c 



n 





means circular convolution of length N, and 




(4.21) 





then 



c 



< N 2 




(4.22) 



n 
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Example : 

The circular convolution of two sequences requiring 8 
bits plus sign, will require at most, 22 bits plus sign to 
represent the output sequence. Since 



. 7 _ c . _ 0 6 | | -6 0 8+8 
N = 64 = 2 c < 2 • 2 

i n 1 — 



b a " b b = 8 l c „l i 26 ‘ 216 



c < 2 

n 1 — 



22 



Arithmetic modulo can be implemented using b = 2 fc 
bit representation of integers with some provision for 
representing 2^. 

Section V deals with the implementation of Fermat Number 
Transforms, where arithmetic is carried modulo F = 2 +1, 

b = 2 t . 

Notice that the maximum length of sequences which can 
be cycled convolved using the FNT with a = 2 is N = 2b 
(N = 4b for a = / 2 ) , and therefore the length of sequences 
which can be convoled is proportional to the word length 
in bits (b) . 

Thus for long sequences the word length requirement 
may be excessive. 

Rader [3] suggested using a two dimensional convolution 
scheme to convolve long one dimensional sequences and 
Agarwal and Burrus [17, [18] presented such a two dimensional 
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convolution scheme. Using this scheme, cyclic convolution 
of length N = LP is implemented as a two dimensional cyclic 
convolution of length 2LXP. 

This two dimensional cyclic convolution can be imple- 
mented using a two-dimensional FNT. Then the word length 
required is proportional to the square root of the length 
of the sequences to be convolved [13] , which would give 
for a maximum sequence length 8b rather than 4b (for a = /2) , 
ie. , for a computer's word length b = 64, the maximum length 
for the transform will be 

N = 8b 2 = 32 768. 

max 

Appendix A illustrates with an example such a scheme and 
also several other points: treatment of negative values in 

data, the structure of the transform and the inverse matrix, 
negative powers of a, frequent "overflow" during computa- 
tions, meaninglessness of the transform values and exactness 
of the final answer. This example will not demonstrate the 
efficient implementation of the FNT using the binary 
arithmetic. 

B. OTHER NUMBER THEORETIC TRANSFORMS 

Mersenne and Fermat number transforms are very promising 
for digital filter computation because they can be calcu- 
lated without multiplications. Their main drawback is a 
rigid relationship between transform length and wordlength. 
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caused by the fact that all operations are performed in a 
finite ring with arithmetic carried out an integer M. 

Another difficulty arises because it is not possible to 
achieve simultaneously optimum efficiency in reducing the 
number of operations and in implementing arithmetic opera- 
tions. This is so because Fermat number transforms are 
amenable to a fast transform algorithm, and Mersenne trans- 
forms are not, whereas arithmetic operations can be imple- 
mented more efficiently modulo a Mersenne number than 
modulo a Fermat number [3] , [13] . 

In what follows, other number theoretic transforms will 
be described briefly. Such transforms provide more choices 
of word length and transform length, thus enlarging the 

possibility of the use of NTT in digital filtering. 

2 

1 . Transforms Over the Galois Field GF(p ) 

Reed and Truong [20] generalized the ideas of 
number theoretic transforms to transforms over the Galois 
Field GF (p^) where p is a prime Mersenne number, i.e. 

p = 2 q - 1 , p = 2, 3, 7, 13, 19, 31, 61 ... 

(4.23) 

Notice that this is a particular case of the definition (4.2), 

in the sense that here one is interested only in p, a prime. 

2 

Also, the Galois field GF(p ) is a particular case of the 
more general GF(p n ), where one is interested in the case 
n = 2. 
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2 

Let GF (p ) denote the Galois Field (finite field) of 

p elements, where p = 2^ - 1, p.and q primes. Let d be 
2 . 2 

a divisor of p - 1 (possibly d = p - 1) . Also let the 

2 

element r e GF (p ) , generate the cyclic subgroup of d 
elements , 



G d = (r,r 2 ... r d_1 , 1) (4.24) 

Then a transform over this subgroup G^ can be defined by 
the following pair [20] 

d-1 

A ' = J a r Kn , for 0 < K < d-1 (4.25a) 

K L n — — 

n=0 



and 



d-1 

a m = ( d ) — 1 l A k r _Km , for 0 < m < d-1 (4.25b) 

K=0 

2 2 
where d divides p - 1, a and A, are elements of GF (p ) 

m K 

and r is a generator of the element subgroup G^. 

It can be shown that the cyclic convolution property 
holds for this transform [20] . 

Now, if 



2 



x 



-1 mod p 



(4.26) 
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is not solvable, then the nonsolvability of (4.26) is 
equivalent to the statement: 

(-1) is a quadratic nonresidue mod p 



(see Appendix B, eq. B-2) . 

By Euler's criterion (Appendix B, theor. B.3), this is 
further equivalent to: 






= (-1) ( P -1) / 2 = (-!) t(2 q -l)-l]/2 



= ( _ 1 } ( 2 q - 2)/2 



(4.27) 



(- d * 29 -. 11 = -i 



where 



(— ) is the Legendre symbol defined by [see Appendix 

ir 

B, eq. (B. 8) ] 






+1 if a is a quadratic residue mod p 



-1 if a is a quadratic nonresidue mod p 



Thus (-1) is a quadratic nonresidue mod p and (4.26) is 
not solvable in GF(p); the polynomial 



p(x) = x + 1 



(4.28) 
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(4.29) 



is then irreducible in GF(p). A root say, i, of 
p(x) = x 2 + 1 = 0 

2 

exists and can be found in the extension field GF (p ) [20]. 

2 

The Galois field GF (p ) is composed of the set 

GF (p 2 ) = {a + ib | a, b e GF (p) } (4.30) 

A 

where i is a root of (4.29), satisfying 

i 2 = -1 (4.31) 

where -1 = (p-1) mod p. 

2 ~ 2 
If x + 1 = 0 is not solvable in GF(p) , l e GF (p ) 

plays a similar role over the finite field GF(p) that 

/-I = i plays over the field of rational numbers. 

A A 

For example, suppose (a + ib) and (c + id) are elements 
of GF(p 2 ), then by (4.31) 

(i) (a + ib) ± (c + id) = (a ± c) + i (b ± d) (4.32) 

/s ^ ^ 

(ii) (a + ib) (c+ id) = ac + i bd + ibc + iad 

(4.33) 

A 

= (ac - bd) + i (be + ad) 
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These are the analogues of what one might expect if (a + ib) 



and (c + id) were complex numbers. In applications to 
radar and communication systems one generally wants to 
take convolutions of complex numbers . 

If (-1) is a quadratic nonresidue mod p, then convolu- 
tion (4.20) of the complex integers can be performed with 

2 

transforms of type (4.25) on a Galois Field GF (p ) where 

2 

a and b are restricted to GF (p ). In other words, if 
n n 

2 

a , b e GF (p ) , for n = 0,1,2, ... d-1 the transforms are 



d-1 




n=0 



B 



K 



d-1 

Kn 

y b r , for K = 0,1, ... d-1. 
L n 

n=0 



where r is a generator of a d-element subgroup G^ 
Then taking the inverse transform of 



C. 



K 



A K • B K , for K = 0,1 



d-1 (4.34) 



one obtains 



d-1 



c 



n 




(4.35) 



K=0 



2 2 

where c n e GF (p ) and d divides p - 1 
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If p is a Mersenne prime, the order t of the subgroup 



with generator a, factor as follows [20] 



t = (2^-1) 2 -1=2 



2 q _ 2*2^ + 1 - 



= 2 



2q _ 2 q+i + 



1-1 



(4.36) 



q+l. 9 2q-q-! _ 



= 2 ^ (2 



1) 



q+l. 9 q-l _ 



= 2 ^ (2 



1) 



Since t has the factor d = 2^ + \ the usual FFT algorithm 
can be used to calculate transforms of as many d = 2^ +1 
points. If 



d = 2 K , l<K£q+l (4.37) 

2 

and a is a primitive element of GF (p ) , then the generator 
of G d is 



r 




(4.38) 



2 

In (4.25) if one wants to take the transform over GF (p ) , 
of 2^ + ^ point sequences of complex integers, a n e GF(p 2 ) 

then one needs to find a primitive element in G of 

2 2 
GF (p ). To achieve this, the following theorems are useful. 

For proofs, see [20]. 
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Theorem 1: 



K 

If p is a Mersenne prime and d = 2 , where 

A 

1 £ K £ q+1, then r = a + ib is a primitive 
. 2 

element in of GF (p ) , if and only if 



r 



d/2 



-1 mod p . 



(4.39) 



Theorem 2 : 

For Mersenne primes p > 3, the first 
quadratic nonresidue modulo p in the 
sequence 1, 2, 3, . .., p-1, is 3. 



To find a primitive element in G of GF (p ) , one can use 

the following procedure. 

A 2 

Assume an element r = a + ib is of order 2 q in GF(p ) . 

Now 



(a + ib) 29 " (a + lb) <a + £b) 



2'J-i 



(4.40) 



and, it can be proved [20] that 



(a + ib; = (a + i d) mod p . 



(4.41) 



Since 



J 2 q -1 = ? ± 2 q -2 = ? ( ? 2 } ( 2 q - 2)/2 



A ( 2 q - 2) /2 ? 
= i (-1) [ ]/ = -i 



(4.42) 
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Recall that q = 2, 3, 5, 7, 13, 17, 19, 31, 61, ... i 
prime. So that (4.40) becomes 

2^3 A A 

(a + ib) = (a + ib) (a - ib) mod p 

2 2 

= a + b mod p 
By Theorem 1, it follows that 



~ 2 ^ 2 2 

(a + ib) = a + b = -1 mod p 



since 



d 



2 q+i 



and 



2 q+l 

2 



2 



q 



Let 



X = mod 2^-1 

Y 5 -b^ mod 2^-1 

then (4.44) becomes 

X + 1 = Y mod p 

By definition in (4.46), X is a quadratic residue. 
Y, [see Appendix B, eq. B.ll, and B.9] 



(4.43) 

(4.44) 



(4.45) 

(4.46) 
For 
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^1 

p 




( T> ( T» 



( 



also (Appendix B, corollary B.5) 

<;!) = (-D (2 q -l-D/2 . ( -l)2 q ' 1 -l = .! 

P 



thus 



Y = X + 1 is a quadratic nonresidue. 

Hence, one way to choose the numbers X and Y is to let X 
and Y be two consecutive numbers from the set of integers 
1,2, ... 2 q -2, such that the first number X is a square and 
the second is a nonsquare. By Theorem 2, for p > 3, Y = 3 
is a nonsquare and the preceding element X = 2 is a square. 
Thus it is sufficient to let 

a 2 = X = 2 mod 2 q - 1 (4.47a) 

b 2 = -X- 1 = -Y = -3 mod 2 q - 1 (4.47b) 

To find the solution of congruence (4.47a) one uses the 
following procedure [20] . 

First, notice that [20] 

(— - ) = 1 (4.48) 

2 P - 1 
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Then by Euler's criterion [Appendix B] 



( — ) 

2 q - 1 



<<2 q -l)-l)/2 = lmod( 2 q -:n ,4.49) 



Multiplying both sides of the congruence by 2, then 



2 (<2 q -2)/2) + l . 2 2 q -l $ 2 moa (2 < I_ 1) 



Hence 



2 < 5“2 2 a 

(2 ) = 2 mod (2 4 -l) 



(4.50) 



Then the solution for (4.47a) is 



2*^-2 q 

a = ± 2 mod (2^-1) 



(4.51) 



Using the same procedure for (4.47b), b is given by [20] 



b = ± (- 3 ) 



q_2 q -I 

2 mod 2 



(4.52) 



In of GF (p ) there always exists a primitive element, 

A 

r = a + ib [20]. By Theorem 1 (4.39) 



(a + ib)^ 2 E -1 mod p 



(4.53) 



Raising both sides of (4.53) to the jth power, (4.53) becomes 
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((a + ib)^ 2 )^ = (-1)^ mod p 



(4.54) 



A j / <\ • 

By Theorem 1, ( (a + ib) ' ) is a primitive element only 

when j is an odd number. The elements are distinct and 
include all primitive elements of G^. 

2 

In other words, in the cyclic subgroup of of GF (p ), 



if (a + ib) is a primitive element then (a + ib) J is also 
a primitive element for j = 1,3,5, ..., d-1. 

Assume a is of order 2 q+ ^ in GF(p 2 ). If d divides 



. 2+1 



2 q+ ^ then r = a 2 ^ is of order d in GF(p 2 ). 



>9+1 



For the transform in (4.25), with d a factor of 2‘ , 

2 q+1 /d 

one can take r = a ' as the primitive element and d 
as the transform length. 



Example: Find all primitive elements expressed as a sum 

2 

of powers of two in the subgroup G^ of GF(p ) , 
where q=5, p=2 q -l = 31 and d = 2 . 

To do this, first find an element with order 
2 q+1 = 2 5+1 = 64 in GF(31 2 ). 

A 

According to Theorem 1, if r = (a + ib) is a primitive 

2 

element in G , of GF(31 ), then 
2 ° 

5 

(a + ib) = -1 mod 31 



By (4.44), that becomes 

a 2 + b 2 = -1 mod 31 
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By Theorem 2, 3 is a nonsquare and 2 is a square. By (4.51) 



,5-2 



a = 2 ‘ 



2 2 = 2 8 = 8 mod 31 



By (4.52) 

5-2 3 

b 5 (-3 ) 2 = (-3 ) 2 = (-3 ) 8 = (28) 8 s 20 mod 31 

Thus 64 is the smallest positive integer such that 

64 

(8 + i 20) = 1 mod 31 

An element with order 2^ + ^/d = 64/8 = 8 is 

(8 + i 20) 8 = 8 8 + ( 8 ) 8 7 (i20) + ( 8 ) 8 6 (i20) 2 

+ ( 8 ) 8 5 (i20) 3 + ( 8 ) 8 4 (i20) 4 

+ ( 8 ) 8 3 (i20) 5 + ( 8 ) 8 2 (i20) 6 

+ ( 8 ) 8 (i20) 7 + (i.20) 8 

g 

where ( ) are the binomial coefficients. Expanding and 
solving. 
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(8 + i 20) 



16 + lOi - 10 - 19i + 9 + 18i - 7 - 5i + 19 



A 

= (27 + 14) mod 31 



and since, if (a + ib) is a primitive element in then 

A • 

(a + ib)-^ is also a primitive element for j - 1,3,5, ... d-1. 
One has 



(27 + i4) 1 mod 31 

(27 + i4 ) 3 mod 31 = (4 + i4) mod 31 

(27 + i4) 5 mod 31 = (4 + i27) mod 31 

(27 + i4) 7 mod 31 = (27 + i27) mod 31 

. 2 
as the primitive elements in G 0 for GF(31 ). 

O 

In order to perform multiplications by powers of r in 

hardware, it might be desirable to represent the q-bit words 

/\ 

a and b, where r = a + ib, as a minimal sum of powers of 
two. Then, for example, r n+ ^ mod p can be obtained by 
multiplying r 11 by r modulo p recursively using a minimal 
number of bit rotations, q-bit word- complements, and 
additions [3]. 

2 

That is, the primitive elements in Gg of GF(31 ) can 
be written: 
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(27 + i4 ) = (2 5 - 2 2 - 2°) + i 2 2 

(27 + i4 ) 3 mod 31 = 4 + i4 = 2 2 + i 2 2 

?5 ? ? " 5 2 0 

(27 + ±4) mod 31 = 4 + i 27 = 2 Z + i (2 - 2 - 2 ) 

/%7 /s 5 9 0 ^ S 9 0 

(27 + ±4) mod 31 = 27 + i27 = (2 -2 -2 ) + i(2 -2 -2 ) 



Notice that, r = 4 + i4 has the shortest number of terms of 

power 2. This primitive element is such that multiplications 

2 

by powers of r in Gg of GF(31 ) yield the least hardware. 

Such an r is called the simplest primitive element. 

In order to perform the convolution of two d-point 
sequences of. complex integers a n and b n with dynamic range 
A and B, respectively, the components of the circular 

A 

convolution c = y + i 6 are required to remain in the 
P P P 

interval [20]. 



P-1 

2 



< 



V 



6 



P 



< 



P-1 

2 



(4.55) 



Specifically to satisfy (4.55) for all sequences a n = + i 8 n 

/\ 

and b = x + i y , such that la | , | 8 I < A, and |x | , 

n n n 1 n‘ 1 n‘ — „ 1 n' 

|y | £ B, it is necessary [20], that: 



dAB < 

— 4 



(4.52) 



If A = B, then by (4.52) the largest value of A is 
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A 



(4.53) 




] 



where [x] denotes the greatest integer less than x. This 
scaling constraint sometimes forces one to choose an 
excessive value p in order to avoid overflow. Such a large 
p implies a computer word length that is often undesirably 
long. 

31 8 

Example: Let p = 2 - 1 and let d = 2 



By (4.53) 



A = \f- 



31 

2 " 2 ] = 2 10 



4x2 



8 



If a and b are constrained to the interval 
n n 



-2 10 < a / 3 # x ,y < 2 10 
— n n n n — 



one is guaranteed to keep the components of c 
interval 



P 



in the 



-(2 31 -l)/2 < x < (2 31 - l)/2 . 

According to [30] and [29] , the Chinese Remainder 

theorem can be employed to develop a ring which is the 

. 2 
direct sum of certain Galois fields GF (p ). This ring is 

utilized to extend the dynamic range of complex number 

convolutions . 
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A disadvantage of this transform is that multipli- 
cation by powers of the primitive element is not as simple 
as that developed by powers of 2, in Mersenne transforms 
and Fermat number transforms. 

Recently, Reed and Liu [36] developed a high-radix 

2 

FFT algorithm for computing transforms over GF (p ) , where 
p is a Mersenne prime. This new algorithm requires substan- 
tially fewer multiplications than the conventional FFT. 

2 . Transforms Over the Finite Field GF (p) 

Colomb, Reed and Truong [19] introduces a Fourier- 
like transform in GF(p), where p is a prime of the form 
A n = 3 x 2 n + 1. This transform increases the variety of 
methods and the digital word lengths that can be used for 
computing the convolutions of integers beyond the previous 
Fermat or Mersenne number transforms. 

Let GF (p) be the finite field of residue classes 
modulo p, and let the integer d divide p-1. Also let the 
element y e GF(p) generate the cyclic subgroup of d elements, 
G d of GF (p) . 

Then a transform over this subgroup G^ can be 
defined by the pair [19] : 

d-1 

A Tr = 7 a Kn 0 < K < d-1 (4.54a) 

K u n — — 

n=0 

d-1 

a n = .(d) -1 l A k Y~ Kn (4.54b) 

K=0 
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where a , A e GF(p) for n = 0,1,2, ... d-1 and (d) 
n is. 

is the inverse of (d) mod p. Also it can be shown [19] 
that the circular convolution of two finite sequences of 
integers can be obtained as the inverse transform of the 
product of the transforms defined by (4.54). 

Notice that if p n is a prime of the form 3 x 2 n + 1 
the order t of GF(p n ) with generator a, is given by (2.14) 

t. = p -1 = 3 x 2 n (4.55) 

n 



Since t has the factor d = 2 n , the usual FFT 

algorithm can be utilized to calculate the transform of as 
n K 

many as d = 2 points. If d = 2 , 1 < K < n and a is the 
generator of GF(p n ), then the generator of G^ is (by 2.20) 



Y 



a 



3-2 n /2 K 



- a 



3-2 



n-K 



(4.56) 



the basic operations used in the transform defined by (4.54) 
are addition and multiplication mod 3 x 2 n + 1. 

Algorithms for searching a prime of the form 3 x 2 n + 1 
have been developed [19] , and will not be discussed in this 
thesis . 

As a final remark about this transform one should say 

that the same type of dynamic range constraint applies here 

2 

as m transforms over GF(p ), although m this case GF (p) 
refers to. integer convolutions. A special number theoretic 
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transform that can be computed using a high radix FFT is 
defined [23] over GF(p), a finite field of integers modulo 
primes p of the form (2 n -l)2 n + 1. Such primes are special 
cases of numbers of the form 2 m - 2 n + 1 proposed by Pollard 
[ 22 ] . 

\ 

If p is a prime of the form p = (2 n - l)2 n + 1 
the order t of a primitive root in the finite field GF(p) 
is given by 



t = p - 1 = (2 n - 1) 2 n (4.57) 

n 

Since t has a factor 2 , one can choose d = 2 , where 
1 1 . ^ 1 . n as the transform length. The arithmetic in 
GF(p) with p = (2 n - l)2 n + 1 is similar to the arithmetic 
in GF(p) where p is a prime of the form 3*2 n +l. Addition 
mod p involves at most a triple binary addition [23] . 

Multiplication by a power of 2 mod p can be imple- 
mented by using the combination of table lookups and mod p 
addition [23] . However, multiplications mod p still needs 
a full binary multiplication followed by a division [23] . 
This seems a disadvantage when compared with Fermat number 
transforms. The FFT over GF(p) is similar to the FFT over 
the complex number field, except that a root of unity y in 
GF(p) replaces e j2ir/d an( j that integer arithmetic modulo 
p is used. 

It turns out [23] , that the number of modulo p 
multiplications required for the evaluation of an FFT over 
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GF(p), p = 2 n (2 n -l) +1/ is greatly reduced when compared 
to the number of multiplications required to evaluate the 
FFT over the complex number field. 

It was found [23], that the number A n = (2 n -l)2 n +l 

is prime only for the cases m = 1/2,4,32. Hence only 

32 32 

transforms over GF (p) when p = (2 -1)2 +1 have practical 

application in digital filtering. The maximum transform 
length in these conditions is equal to 4 294 967 296! 

It can be shown that high radix FFT can also be 
used to compute transforms over a finite ring modulo a 
number of the form (2 ra -l)2 n +l with m even and m = 1 or 
m = n, in a similar fashion for the GF (p) case. The condi- 
tion for such a transform to exist was shown in References 
[22] and [20]. 

3 . Complex Mersenne Transforms and Complex Pseudo- 

Mersenne Transforms 

The main limitations of the Mersenne transform 
approach are related to the fact that the number of trans- 
form terms g is a prime. This means that calculations of 
the transforms cannot be simplified by an FFT-type algorithm 
and that the number of transform terms is equal to the 
word size. These limitations can be slightly alleviated by 
using a root (-2) instead of 2 in (.4.4) and (4.5), as said 
previously (4.8). The maximum transform length then becomes 
2q. It is also possible to increase the maximum convolution 
size by resorting to multidimensional convolutions [3] , [4] , 

[31J . Unfortunately, this result is achieved at the expense 
of increased requirements for computation and storage. 
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By defining Complex Mersenne Transforms, it is 
possible to achieve higher computation efficiency while 
increasing both maximum transform length and convolution 
length. Again, in a Merseene ring, with p = 2 q - 1, (2) 

and (- 2 ) are respectively roots of orders q and 2 q, 
corresponding to transforms of lengths q and 2 q, respectively. 

Since j is a prime, 2^ and -2^ are also roots of 
q and 2 q, provided d is not a multiple of q (by 2 . 20 ) . 

Notice that (2j) is a root of order 4q, since 

(2j) 4q = (2 9 ) 4 (j 4 ) q = (l ) 4 (l) q = 1 mod 2^ - 1 (4.58) 



Also (1+j) is a root of order 89, since 



(i+j ) 



8q _ 



= [(1 + 



ji 8 j q 



and 



( 1 +j ) 8 = 1 + (J) l 7 j + (®) l 6 j 2 + ( 3 ) 1 j 3 + ( 4 ) l 4 j 4 



. 8 . .3 .5 . , 8 , ,2 .6 . , 8 , . .7 . .8 
+ ( 5 ) 1 3 + ( g > 1 3 + ( 7 ) 1 3 +3 



= 1 + 8 j - 28 - 56 j + 70 + 56 j - 28*8j + 1 



= 16 + Oj 
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Then 



(l+j) 8q = (2 4 ) q = ( 2 q ) 4 = (1) 4 mod 2 q -l = 1 mod 2 q -l (4.59) 

The same conclusions can be drawn with respect to (1-j), 
ie., (1-j) is a root of order 8q, in the ring p = 2 q -l, 
for q = 2,3,5,7,13,17,19,31,61, . . . 

Higher order complex roots do not have a simple 
structure and therefore will generally not be of practical 
interest. 

Under these conditions, a Complex Mersenne Trans- 
form having 4q terms can be defined by [24] , 

4q-l 

A = l a j nK 2 nK mod (2 q -l) (4.60) 

js, n 

n=0 

j = '/~ 1 , K = 0,1, ..., 4 q— 1 

Notice that q has an inverse modulo p, and that the 

2 

inverse of 4 is 2 q , since 

4 = 2 2 (4) _1 mod 2 q -l = 2 q_2 (4.61) 



for 



2 2 • 2 q 2 = 2 2 • 2 q * 2~ 2 = 2 q = 1 mod (2 q -l) . 
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Then 4q has an inverse R such that 



( 4qR) mod 2 q - 1 = 1 mod 2 q - 1 

and the inverse transform of is 

4q-l 

a = R l A j _mK 2 _mK mod 2 q - 1 (4.62) 

m i\ 

K=0 

m = 0, 1, . . . , 4q-l 

where all exponents and indices are taken, modulo 4q, 
in both (4.60) and (4.62). 

Using a root (j+1) leads to a definition of a 
Complex Mersenne Transform having 8q terms with 

^ q 1 a (l+j) nK mod 2 q -l (4.63) 

“ l “ 

n=0 

K = 0,1, ..., 8q-l 

and, with R such that (8qR) mod 2 q -l = 1 , an inverse 

transform 



8q-l 

a = R J A (l+j) -mK mod 2 q -l (4.64) 

m L K J 

K=0 

m = 0, 1, ..., 8q-l . 

with all exponents and indices taken modulo 8q. 
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Computation of a complex convolution by means of 
Complex Mersenne Transforms is carried out as with the 
DFT, with real and complex parts being evaluated modulo p 
separately. In order to avoid errors due to overflow, the 
amplitudes of real and imaginary outputs must be bounded 
to (p-l)/2. This means that usually the word length of 
real and imaginary parts of input sequencds (a n ) and (x n ) 
is less than half that of output sequences. In other words, 
all computations are carried out modulo p on q-bit words, 
yielding q-bit word outputs, and the input sequences are 
represented by words of length less that (q-l)/2 bits. 

Notice that the calculation of these transforms can be 
partly simplified by an FFT-type algorithm because the 
number of terms is no longer a prime . 

It has been shown 124] that, in the practical range 
of interest for q (where q = 31) , substituting Complex 
Mersenne Transforms for conventional Mersenne Transforms 
results approximately in an eightfold reduction in the 
number of operations . 

If the two sequences (y n ) and (a n > to be convolved 
are real, the full benefit of using Complex Mersenne Trans- 
forms can be retained by processing simultaneously two 
successive blocks of sequence (y n ) by means of the same 
Complex Mersenne Transforms . This is done by computing the 

complex convolution (Z }, of the sequence (a } with the 

m n 

auxiliary complex sequence (x = y + j y . 0 ) . The real 

n n n-h o c[ 
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part {u } and the imaginary part {u > of {Z > yield 
r m m m 

respectively the convolution of {y n > and the next block 
{ >'n+8q ) uith {a n K 

Up to now, the discussion of this section has been 
restricted to Mersenne numbers, that is, to numbers 
p = 2 q - 1 such that p is a prime. If p is not a prime, 
its prime factorization is given by 



P 




(4.65) 



Recall that an m-point real transform having the circular 
convolution property can be defined in the ring of integers 
modulo p, provided m-point transforms can be defined 
separately in the fields p^, p 2 , ..., p^. 

This follows directly from the Chinese remainder 
theorem (2.27), and leads to the conditions for the exis- 
tence of an m-point transform in the ring p that m must 
simultaneously divide p^-1, P 2 - l' •♦♦/ P^~l- When p is 
a prime, the maximum length of the transform is M = p-1. 

Transforms in a ring p, with p nonprime, are there- 
fore proportionally shorter than transforms defined modulo 

a prime number. If p and q are composites with q = 

^1 

and q prime, 2 -1 divides p and the maximum transform 

q l 

length is governed by that possible for 2 -1. 

This led Rader [3] to consider that the only transforms 
of interest in a ring modulo 2 q -l were Mersenne Transforms. 
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The situation changes noticeably if one considers 
Comples Mersenne Transforms. Nussbaumer [24] shows that 
given an m-point real transform of root 2 with p composite 
and q, ,m odd integers, one can define 8m- point complex 
transforms in the ring modulo p with 



8m- 1 


j = 


v / -i . • \ wnK 

l a n (1+3) 


(4.66a) 


n=0 


K = 0,1, .. . , 8m- 1 



8m- 1 

a m = (8m)" 1 l A k (l+j)" wnK , m = 0,1, ..., 8m- 1 

K=0 

Notice that, the existence of an m-point real transform 
in the ring modulo p implies that m has an inverse, 

R modulo p. Also, the inverse of 8 modulo p, is 

8 _1 = 2 q ~ 3 since 8 x 8 _1 = 2 3 -2 q ~ 3 = 2 3 -2 q -2 -3 = 2 q = 1 



thus 



n /"t o 

(8m) = 2 q • R mod p exists. 

Table I shows the various possibilities for p 
nonprime and odd q. 

The case of q prime (q = 23,29,37,41,43,47) corres- 
ponds to conventional Mersenne Transforms. When q is not a 



mod p 



96 



C/3 



CM -P 

5 8 

6 O U 



S 8 



I 

*H 

C4 

4-1 

0 

1 

rd 

N 

M 



fd 

44 



& 



44 

0 

§ 

-H 

Id 

N 

•H 

§ 

1 



54 



D 1 ' 

CM 



*H 

C4 



CM 

C4 



C4 - 

a 



O 1 



\ \ 



CM 



CO 



CO 



CO 

CM 



cn 

CM 



p** 

co 


















au 
















CO 


co 


































CM 


• 
















• 


p- 
















i — I 


• 
















CO 


uo 
















• 


• 


— «. 








CM 




au 


1 — 1 


CO 


au 








UO 




CM 


1 — 1 


CM 


co 








• 




• 


• 




o 








CM 




CM 


CO 




i — I 








CO 


CTU 


CO 


• 


P- 


*•> 


X — V 


r- 




• 


i — 1 


• 


CM 


• 


p- 


CM 


• 




CO 


• 


ro 




CM 


uo 


UO 


co 




CM 


CO 


CM 




CO 


X 


• 


• 


P** 




CO 


* — 1 


1 — 1 


• 




CO 




au 




• 




1 — 1 


CM 


p** 


• 


CM 


• 


CM 


au 


CTU 


• 


- — 


CO 


CM 


" — ' 


CO 


un 


CM 


CM 


CO 


x — „ 


• 






CM 


• 


„ " — 


• 


CM 


P- 


CO 




P- 


• 


CO 


x — V 


au 


< * 


• 


• 


un 


• 


UO 


• 


CM 


i -4 




UO 


uo 


• 


CM 


• 


CO 


CO 


• 


i — 1 


• 


CM 


co 


CO 




CM 


• 


CM 


•—1 


CM 


" — 


• 


• 


CM 


v — 1 ’ 


CO 




• 




^ — N 


CM 


CM 


" — 1 




CM 




CM 




P- 


v — ■ ’ 




X N 


UO 


v — 1 ’ 






uo 


CO 


„ S 


— N 


CO 


• 




CM 




• 


• 


CO 


CO 


CM 


CO 


CO 


• 


CO 


co 


CO 


• 


• 


• 


• 


• 


CO 


• 


• 


• 


CM 


CM 


CM 


CM 


CM 


CM 


CM 


CM 


CM 



co 

i — i 

• 

r- 

un 

• 

CM 

CO 

• 

CM 

ro 

i— l 
• 

CO 
f CM 



CD 

CO 

CO 

CO 

i-4 

• 

CO 



CO CO 
• CM 
CM ^ 



o 

LD 

CO 

cd 

LD 



CO 

CM 



CO 



co 



p" 

CM 



CM 

CO 



CO 



CM 

ro 



co 



co 



CM 

LD 



CO 

CM 



CM 

CO 



p- 

CO 



p- 

LD 



CO 

CM 



CM 

CO 



^ CM 
CO ^ 
UO 



CM 



CO 



LD 

CM 



CO 



LD 



CO 

CM 



uo 

CM 



p- 

CM 



au 

CM 



CO 

CO 



un 

co 



p- 

co 



au 

co 



CO 



un 



p- 



UO 

un 

p- 



co 



CM 



P- 



— cm 

uo 

CO 

• 

CM 
^ CM 
^ CO 
CO • 

• CM 
CM ^ 



P- 

UO 



























co 


CD 


























CM 


CM 
















i — 1 


p- 


au 




CO 


• 


un 
















CM 


p- 


uo 


CO 


uo 


1 — 1 


nr 














au 


au 


1 — 1 


CO 


un 


CO 


CO 


uo 












au 


p- 


CM 




• — 1 


co 


CD 


uo 


CM 












CO 


nr 


CM 


00 


CM 


• — i 


au 


• 


CO 








1 — 1 


p- 


o 


<T\ 


1 — 1 


1 — 1 


1 — 1 


• — I 


o 


1 — 1 


i — 1 




p" 




o 


un 


CM 


au 


• 




• 


un 


CM 


un 


• 




CO 


i — 1 


CO 


uo 


• 


un 


p* 


CO 


1 — 1 




• 


i — I 


CO 


i — 1 


CO 


CO 


1 — 1 


CM 


co 


• 


CM 


uo 


au 


uo 


au 


• 


1 — 1 


uo 


• 




• 


UO 


o 


au 


1 — 1 


1 — 1 


1 — 1 


• — I 


i — l 


CO 


un 


i — 1 


p> 


CO 


1 — 1 


CM 


1 — 1 


00 


• 




00 


• 


p- 


p" 


nr 


• 


CM 


p** 


o 


• 


1 — 1 


• 


1 — 1 


uo 


• 


p- 


CD 


• 


• 


i — 1 


i — 1 


1 — 1 


uo 


CO 


t 


CO 


p- 


• 


a> 


uo 


• 


1 — 1 


i — l 


CO 


• 


• 


• 


p* 


CO 


CM 


• 


CO 


p* 


CO 


i — 1 


CO 


uo 


• 


CM 


p* 


1 — 1 


• 


CM 


• 


1 — 1 


CM 


• 


CO 


CO 


• 


CO 




P** 


nr 


CO 


p- 


CM 


P* 


CO 


CM 


p- 


1 — 1 


nr 


p- 


CM 



p- 



co 

uo 



co 



un 

CM 



p- 



U 0 

CM 



o 

o 

p 



CTU 



97 



127-4432676798593 (2-3 -7(2-3 -7 -43-337-5419) 



prime , the corresponding transforms are called pseudo- 
Mersenne transforms. They have a very short length and 
their roots are not powers of 2. Notice that, in order to 
achieve maximum effectiveness in computing convolutions by 
means of pseudo-Mersenne Transforms, it would be desirable 
to have relatively long transforms with a number of terms 
highly factorizable . This does not seem possible with 
transforms modulo p = 2^-1. One notes, however that when 
p is not a prime, with the prime factorization of p defined 

di 

by (4.65), one can define transforms modulo p/p^ having 
power of 2 roots and such that the number of terms is 
large and highly factorizable. 

These transforms can be defined by 

8 m- 1 , 

nTC d i 

A R = ( l a n (1+j) ) mod p/p i (4.67) 

n=0 

K = 0,1... 8m-l 

j = AT 

Various possibilities of such transforms are listed in 
Table II. 

It can be seen that the maximum number of terms is 
both large (40 to 392 terms) and highly factorizable, thereby 
leading to efficient FFT type computation with a minimum 
number of operations. It would seem, however, that these 
advantages are offset by the fact that the various operations 
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are performed modulo (2^-1) /p^ 1 . The corresponding arith- 
metic circuits are much more complex than arithmetic 
circuits modulo 2^-1 [24] . 

This difficulty can be circumvented by noticing 

that as 




one can compute the convolution modulo p = 2^-1 as with 

conventional Mersenne Transforms and obtain the final 

d i 

result by performing a last operation modulo p/p^ on 
the convolutions computed modulo p, i.e., 

d. d. 

mod p/p^ 1 = (Z m mod p) mod p/p^ 1 (4.68) 

By proceeding in this fashion relatively long 
convolutions can be computed efficiently by means of FFT- 
type algorithms with all but the last operation performed 
with implemented arithmetic circuits operating modulo (2*^-1) 
[24] . 

Taking as an example the case of transforms defined 

by q = 25, one can see from Table I that the maximum odd 

25 

length for real transforms computed modulo p = 2 -1 is 

15 terms and that the corresponding roots are not powers 
of two. By operating modulo (2- -1)/31, it is possible 
to define real transforms having power-of-two roots with a 
maximum odd length increased to 25 terms. The maximum 
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length is then expanded to 200 terms by using complex 
roots. Such a transform can be computed very efficiently 
by means of an FFT type algorithm with a three-stage radix 
2 decomposition, followed by a two-stage radix 5 decomposi- 
tion. 

One limitation of conventional Mersenne Transforms 
is the rigid relationship between word length and trans- 
form length. In this respect, pseudo-Mersenne Transforms 
provide a significant improvement because their maximum 
number of terms M max is highly composite and any transform 

length submultiple of M can be selected. 

max 

It is even possible to have several transforms of 
identical length and defined modulo integers p^, ... p^ 

that are relatively prime. The convolution can then be 
computed separately modulo p^, ... p^ and then the final 
result obtained modulo (p^*P 2 *’*P^) by the Chinese remainder 
theorem. This approach could, for instance, be used to 
compute a 40-term convolution with an approximate word 
length of 32 bits by means of transforms defined by 
modulo (2 15 -l)/7 and (2 25 -l)/31. 

4 . Pseudo Fermat Number Transforms and Complex 

Pseudo Fermat Number Transforms 

This section considers a generalization of Fermat 
Number Transforms, such that transforms having roots which 
are powers of 2 are defined in field or ring which is a 
factor of an integer, p = 2 g +l. With such pseudo Fermat 
Number transforms it is possible to have much more flexibility 
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in selecting desired word lengths than with conventional 
Fermat Number Transforms. In some cases it is possible 
to define complex pseudo FNT's which are well adapted for 
filtering complex signals and allow increased transform 
and convolution lengths when compared to conventional 
FNT ' s . 

Number theoretic transforms in a ring submultiple 
of p = 2^+1 are called pseudo Fermat Number Transforms 
[37]. 

If one restricts to roots 2 , one can define an 
M-term pseudo FNT and its inverse by 

M-l d 

A K = ( l a n 2 wnK ) mod p/p i i (4.69a) 

n=0 

M-l d. 

a^ = (R J A R 2 a)ItlK ) mo d p/p^ (4.69b) 

K=0 

m = 0,1 ... M-l 
d. 

with M*R = 1 mod p.p^ 1 . 

It can be seen that these transforms have the same 
structure as pseudo Mersenne transforms but are defined in 
a ring submultiple of 2*^+1 instead of a ring submultiple 
of 2^-1 for pseudo Mersenne transforms. 

The choice of the particular ring on which pseudo 
FNT's are defined is very important. Usually, p will be 
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divided by its smallest factors, with the remaining factors 
large enough to allow defining long transforms with powers 
of two roots. Pseudo FNT's can be defined to q even and q 
odd [37] . Various possibilities for such transforms with q 
even are listed in Table III. 

It can be seen that there is much more flexibility 
in word length and transform length selection than with 
transforms defined modulo 2^+1. 

Here also as in the case of pseudo Mersenne trans- 

q d i 

forms , performing the various operations modulo 2 +l/p^ 
would seem rather awkward, as the corresponding arithmetic 
circuits are much more complex than those operating 
mod (2*^+1) . Again the solution is to compute the convolu- 
tion modulo p = 2*^+1 as with conventional FNT's and obtain 

d. 

the final result by performing a last operation modulo p/p^ 
on the convolution evaluated modulo p: 

d d. 

Z modulo p/p. = (Z modulo p) modulo p/p. 
m *1 m l 

(4.70) 

Assuming the factorization of the number of transform terms 
is given by 



M = • M 2 • ■ • M j (4.71) 

the pseudo FNT can be computed by a mixed-radix FFT-type 
algorithm. 
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FOR PSEUDO FNTS IN THE RING (2 q +l)/p i 1 WITH q EVEN 
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5-277-1013. 1657-30269 (2 +l)/5 92 13 920 773 304 712 2 

(2 2 -23) 



If q is odd, it is possible to increase the maximum 
transform length [37] by using complex pseudo FNT's. The 
existence of complex pseudo FNT's can be demonstrated by 

considering an M term pseudo FNT defined in the ring 

*^i q W 

p/p^ = (2 +l)/p^ with a root 2 of order M. 

Notice that if W and q are odd, the condition 

M • W = 2q (4.72) 



implies that M is even, and M/2 is odd. Under these 

W 

conditions, (-2) is a root of order' M/2 since 

[ (- 2 ) W ] M / 2 * (- 2) WM//2 = (- 2 ) 2q / 2 = (- 2 ) q 



= (-2 q +2 q +l) = 1 mod 2 q +l (4.73) 



TVT J 

and (-2) is also a root of order M/2, provided d and q 

W . 

have no common factors [37J . This suggests that (2j) is 
a root of order 2M, since 



(2 j ) 



. * W2M = ( 2 q ) 4 ( -i 4 ) q = f-l) 4 (l) q = 1 ^ q 



= ( 2 j ) "* >1 = = (-1) (1) = 1 mod 2^+1 

(4.74) 



W . 



and (1+j) is a root of order 4M since 



(l+j) W4M = (l+j) 8q = (2 4 ) q = (2 q ) 4 = (-l) q = 1 mod 2 q +l 

(4.75) 
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thus a 2M term pseudo FNT can be defined as [37] 



2M-1 



d ± j = /-I 



^ = ( I a n ( 2 j ) WnK ) m od 



K = 0,1 ... 2M-1 



n=0 



(4.76a) 



d. 



As M has an inverse in the ring p/p^ and 2 is relatively 



i 



d. 



prime with p, 2M has an inverse R in the ring p/p^ 1 and 



an inverse transform can be defined by [37] 



2M-1 




a 



m 



(R l A r (2 j ) WmK ) mod p.p i 



(6.76b) 



K=0 



m = 0,1, ... 2M-1 



with all exponents and indices taken modulo 2M. 

It can be demonstrated that the transform satis- 
fies the convolution theorem [37J , and that two complex 

sequences of length 2M can be cyclically convolved via 

d. 

complex pseudo FNT's modulo p/p^ 

In such an approach, all arithmetic operations are 

2 

performed as in normal complex arithmetic with j = -1, 

and real and imaginary parts treated separately modulo p. 

The final convolution product is obtained by performing a 

d. 

last operation modulo p/p^ 1 . 

A 4M-points complex transform can be defined by 
using a root (1+j) instead of (2j) [37] 
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(4.77) 



4M-1 d 

= ( l a n (l+j) WnK ) mod p/p ± 1 

n=0 

K = 0,1 ... 4M-1 

Higher order complex roots have a complex structure and 
therefore the maximum length of complex pseudo FNT's which 
can be computed without multiplications is 4M in the general 
case, and 8q when W = 1 (M = ^-) . 

It can be seen that using complex pseudo FNT's 
allows for a given computation complexity, operation over 
transforms and convolution lengths twice as long as with 
real Fermat and pseudo FNT's. 

In particular, when W = 1, the maximum length of 
an FNT is 4q for a root /2, while the maximum length of a - 
complex pseudo FNT is 8q for a (1+j) root. 

Various possibilities, q odd, for complex pseudo 
FNT's are listed in Table IV. It can be seen that for q 
prime, we have complex transforms of length 8q with root 
(1+j). These transforms have a number of terms which are 
not highly f actorizable . 

A more interesting case seems to correspond to 
those nonprime values of q, for which it is possible to 
define transforms with a number of terms which are both 
large and highly factorizable and therefore lead to efficient 
computation via an FFT type algorithm. 
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In these respect, the 200-points and 392-points 

. 25 

transforms defined respectively modulo (2 +1)/3.11 and 

49 

(2 '+l)/3.4 3 are particularly attractive. 

It is sometimes desirable to compute convolutions 
with improved dynamic range. In this case the same convo- 
lution can be computed modulo several relatively prime 
integers p^, p 2 * ... p^ and the final result obtained 
modulo (p^*p 2 ***p^) via the Chinese remainder theorem. 

For this application, the availability of complex pseudo 
Mersenne transforms and pseudo FNT's having the same length 
and defined modulo relatively prime integers is particularly 
interesting. A 200-point convolution could for instance 

be computed with a dynamic range of about 40 bits via complex 

25 

pseudo Mersenne transforms defined modulo (2 -1)/31 and 

25 

via complex pseudo Fermat transforms modulo (2 +1)/3.11. 
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V 



IMPLEMENTATION OF FERMAT NUMBER TRANSFORMS 



The best known number theoretic transform is the Fermat 
Number Transform (FNT) . FNT are potentially attractive 
for digital filtering applications because they have the 
convolution property (3.5) and can be computed without 
multiplications . 

In principle, such Number Theoretic Transforms (NTT's) 
could be implemented in the same way as Discrete Fourier 
Transforms but with multiplications by trigonometric func- 
tions replaced by multiplications by powers of two, all 
operations being performed modulo a Fermat number. 

In practice, however, direct transposition of Fast 
Fourier Transform (FFT-) architectures does not necessarily 
lead to optimum implementations and the development of 
special configurations to computing NTT's seems worth 
exploring. Along these lines, the special attributes of 
the FNT, led several researchers to consider various coding 
techniques for simplifying the implementation of the trans- 
form and special purpose implementations of the FNT. 

This section will discuss these concepts. In part A 
various coding techniques are presented and in part B the 
realization of the FNT is considered. 

A. BINARY ARITHMETIC FOR THE FERMAT NUMBER TRANSFORMS 

In computing the FNT, arithmetic is done modulo F = 2 , 
b = 2 t (4.11). In this arithmetic the only allowed integers 
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are {0,1 ... 2 } and all integers whose absolute value do 
not exceed 

Vi _ (2 b +l)-l . £ . 2 b-l (51) 

can be represented unambiguously. 

Negative numbers are represented by adding 2 D +1 to 
them; this is similar to twos complement and ones comple- 
ment representation of negative integers . 

Notice that in a b-bit register, all integers from 
0 to 2-1 can be represented. 

Example: Let F = 2 b +l = 2 4 +l = 17 b = 2 b = 2^ = 4 

(i) allowed integers {0,1,2, ..., 16 = 2 4 } 

(ii) absolute values of number that can be represented 
unambiguously do not exceed 

= iZ^Li = 8 = 2 13 - 1 = 2 4 ' 1 = 8 

(iii) negative numbers: 

-5 = (-5 + 17) = 12 mod 17 

(iv) a b = 4 bit register allows as maximum 

1111 2 = 8 + 4 + 2 + 1 = 15 1q = 2 b -l = 2 4 -l = 15 1q 
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Thus, using a b-bit register, the problem remains to 
represent the quantity 2 ° (in the example above, 2 ° = 2 4 = 16) . 

If data is uncorrelated, the probability that this number 
will appear after an arithmetic operation is approximately 
2 _b [17]. 

For digital filter applications, b would typically be 
32 or 64; in these cases the occurrence of 2 b is extremely 

\ 

small. 

In their software realization of the FNT, Agarwal and 

Burrus [4] define a binary arithmetic modulo F = 2+1, 

b = 2 b . The representation of such a modulus requires 

b 

(b+1) bits, for the representation of the quantity 2 =1 

mod F^_ . In order to simplify modular arithmetic, Agarwal 
limits his realization to a b-bit arithmetic. 

This involves some input quantization error where (-1) 
occurs as an input sample, as well as the extremely small, 
but realistic, probability of a complete data block in 
error when a (-1) occurs as the result of an FNT computation. 

The following discussion is based on the b-bit represen- 
tation of integers. The various basic arithmetical opera- 
tions can be implemented modulo F [4] . 

(i) Negation 

b-i • 

Let A = E a. 2 , a.=0 or 1 (5.2) 

i=0 1 1 

Then by [4] 
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-A 



(2 b -l) 



(5.3) 



b-1 b-1 

= I a. 2 1 = l a7 2 1 - 

i=0 i=0 

Example: Let A = 8 = (1000)2 = 8 mod 17 (= 2 4 +l) . 

Then 

-A = (0111) - (2 4 -l) = 7 - (15) = -8 

Notice that (5.3) can be arranged in the following form: 

b-1 

-A = l a~ 2 1 - (2 b -l) 
i=0 

b-1 

= l aT 2 1 - (2 b -l) + (2 b +l) mod 
i=0 



b-1 

J 

= l a“ 2 1 + 2 (5.4) 

i=0 



Thus to negate a number, one has to complement each bit 
and add 2 to it. 

Example: Let F^_ = 17 

0111 

A = 8 = 1000 -A = -8 = { +10 } = 9 mod 17 

1001 
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Example : 



17 



Let F = 2 4 + 1 = 

/ 

0111 

and A = 8 = 1000 ; -8 = < +10 / = 9 mod 17 

1001 
; / 



(ii) Addition 

When one adds two b-bit integers , one obtains a 
b-bit integer and possibly a carry bit. The carry bit 
represents 2 ° = -1 mod . 

To implement addition in arithmetic modulo 2 D +1, 
one simply subtracts the carry bit. Thus the hardware 
should be of the carry subtract type. 



Example : 



Let F = 17 






1111 

15 + 10 = 25 = 8 mod 17 = { + 1010 

.001 



) = 8 mod 17 



QlOOl 
^ 1000 



(iii) Subtraction 

Subtraction is implemented as an addition by first 
negating the subtrahend and then adding terms. Addition 
must be carried out according to (ii) . 



Example : 



Let F = 17 



13-4 = 



13 + (-4) = ' 



13 

-4 



= 1101 
= (-)0100 



1011 

+10 
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1101 



13 + (-4) = 9 = 



1101 

1101 



moio 

-1 

1001 = 9 mod 17 



(iv) General multiplication 

When one multiplies two b-bit integers, one gets 
a 2b-bit product. Let C L be the b-bit low order of it 
and C„ be the b-bit high order part of it, then 

rt 

A x B = C T + C„ 2 b = C T + (-C„) mod F. 

±j xi Jj rl t 

Thus, all one has to do is subtract the higher order b-bit 
register from the lower, order b-bit register. The subtrac- 
tion needs to be done according to (iii) . 

2 

Example : Let F =2 + 1 = 17 

15 x 10 = 150 = 14 mod 17 



Now 




150 



10 



= ( 



1001 0110 
C H C L 



) , and by (i) 



1001 — ► 0110 
+10 
1000 



.C L = 0110 
C-C H ) = 1000 

1110 = 14 mod 17 



t 
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(v) Multiplication by a power of 2 

If x is taken as 2 or a power of 2 (in 3.7), 
the only multiplications involved in taking the Fermat 
transform are those by some power of +2 . These multipli- 
cations are particularly simply to implement in arithmetic 
modulo F^_ . Suppose one needs to multiply A by 2 , 

0 < K < b, all one needs to do is shift to the left the 
content of the register by K bits, and subtract the K 
overflow bits (assuming double b-bit registers) . If K is 
outside the range 0 < K < b, one makes use of the fact 
2 b = -1 mod F^_ . 

Computation of the inverse transform requires 
multiplications by negative powers of 2 which can be 
converted to positive powers by the following relationship 

2~ K = -2 b • 2~ K = -2 b_K mod F fc (5.5) 

Example: Let F^_ = 2^ + 1 = 17 



(i) 



15 x 2 2 = 60 = 9 mod 17 



15 = 0000 1111 



Shift left 2 positions 



0011 1100 



'H 



C„ = 0011 — * 1100 

n 

+10 

-c„ = 1110 

n 



(- C H } 



= 1100 

= 1110 




1001 



9 mod 17 



_ 
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(ii) 13 x2 -3 = 13 x (-2 4 " 3 ) = 13 x (-2) = -26 = 8 mod 17 



13 = 0000 1101 



C H C L 



Shift right dircularly 3 positions 0001 



C H C L 



C T = 1010 0101 C„ = 0001 

li ri 

+10 (-C L ) = 0111 

-C L = 0111 1000 = 8 mod 17 

For the implementation of the Fast Fermat Transform, unlike 
the FFT, one does not need to store the powers of x. For 
serial arithmetic, one could have a register which stores 
the shift amount K, and as one goes along the Fast Fermat 
Transform flow chart, one continually update the shift 
amount. This realization of b-bit arithmetic involves, as 
previously said, some input quantization error when (-1 = 2 ) 
occurs as one input sample, as well as the extremely small, 
but realistic, probability of a complete data block in error 
when a (-1) occurs as the result of a FNT computation. 

It is, of course, desirable to compute the FNT 
exactly. The difficulties in performing binary arithmetic 
modulo F , become apparent when one considers, for example, 
multiplication or addition in a ring of integers modulo F^, 
involving the binary representation of -1 = 2 . For example, 
when b = 4, 2^+1 = 17 the product of 10000 (-1) with itself 
is 00001 C+l) . 
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A technique for the exact computation of the FNT 
and its implementation are described by McClellan [26] . 
McClellan's approach involves the definition of a new binary 
code representation for the integers modulo F^. 

Given a binary representation of (b+1) bits, 

A = [a b , a b-1 a Q ] (5.6) 

this new code is described as follows. If 

a b = 1 then A = 0 

(5.7) 

u i V\— . 9 

a, = 0 then A = a, .. 2 + o 2 + . . . + o 

b b-1 b-2 o 

where 

1 if a . = 1 

a . = 

D 

-1 if a j = 0 

b 4 

Example: Let F-j- = 2 +1 = 2+1 

10000 represents zero 

00111 -2 3 + 2 2 + 2 1 + 1 = -1 = 16 mod 17 

01011 2 3 - 2 2 + 2 + 1 = 7 mod 17 



(5.7a) 



= 17, b = 4 



118 



and 10101 is an illegal combination, since the only allowed 
number with the most significant bit (MSB) equal to 1 is 
10000 (zero) . Consider arithmetic operations using this 
number representation. 

(i) Multiplication by a power of two. 

If the number is zero (i.e., a^ = 1) one does nothing. 

If the number is nonzero, the low order b bits are circularly 
shifted to the left a number of places equal to the power 
of 2, and a bit is replaced by its complement as it enters 
the least significant bit position (LSB) . 

Example: Let F = 2^+1 = 2 4 +l =17, b = 4 

Using the proposed code 



01100 



3 2 

2 + 2 - 2 - 1 



9 mod 17 



applying the above rule 



9x2 



4 lower order bits 



9 01100 



x 



2 




01001 

18 = 1 mod 17 = 01000 = 2 3 -2 2 -2-l = 1 mod 17 
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9x8 



? 



Further , 



01100 



01001 



9 

x 

2 




00001 

36 
x 

2 

00000 

72 = 4 mod 17 = 00001 = -2 3 -2 2 -2+l = -13+17 = 4 mod 17. 




In a hardware implementation the MSB is used as a control 
bit. If it is one then the number is zero and the rotation 
is inhibited. This is characteristic of all operations 
using this new coding scheme. 

(ii) Negative of a number 

This is done by complementing the low order b bits 
except in the case where a^ = 1. Again the MSB is a control 
bit that inhibits the operation if it is one. 
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Example: Let F = 2^+1 = 2 4 +l =17; b = 4 

Using the proposed code 

A = 01101 = 2 3 + 2 2 - 2 + 1 = 11 mod 17 

and by applying the above rule 

-A = 00010 = -2 3 -2 2 +2-l = -11 mod 17 

(iii) Addition 

If either or both of the operands for addition are 
zero (i.e., a^ = 1 or = 1), then there is no addition 
to take place. That is, these special cases can be sensed 
and the addition inhibited. Now consider the addition of 
two numbers A and C where A ^ 0 and C ^ 0. 

Let 



and 



A = a, a, , ... a with a. = 0 

b b— 1 o b 

(5.8) 



C 



c b c b-l 



c with c, = 0 

o b 



Interpret the b LSB's of A and C as the binary 

A /N A ^ 

representation of A and C, and form the same A and C using 

A 

unsigned binary addition to obtain D. 
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That is 



; _ 0 b-l , 0 b-2 , 

A a b-l 2 + a b-2 2 + + a o 

(5.9) 

+ 

C = c, . 2 ° 1 + c, „ 2 b 2 + . . . + c 
b-1 b-2 o 

A+C = D = d, 2 b + d, . 2 b_1 + d, „ 2 b " 2 + ... + d 
b b-1 b-2 o 

/\ 

It is possible to deduce from D, the desired sum 
D = (A+C) mod 2 b +I [26]. 

Notice that 

A 

A = 2A + 2 [26] (5.10) 

Example: Let F = 2 b +l = 2^+1 = 17 

Using the proposed code 

A = 01010 = 2 3 - 2 2 + 2 - 1 = 5 mod 17 

by (5.9) 

A = (1) 2 3 + (0) 2 2 + (1) 2 1 + 0 (2°) 

= 8 + 2 = 10 
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and, 

A 

2A = 2(10) = 20 

+ 

2 

v 2 A + 2= 20 + 2 = 22 = 5 mod 17 = A. 

A 

To deduce from D the desired sum D, verify that 

A A A A i 

D = A+C = 2A+2 + 2C+2 = 2A+2C+4 mod 2+1 . 

Example: Let F = 17 

and 

A = 01010 = 5 A = ( 1 ) 2 3 + (0)2 2 + (1) 2 = 8+2 = 10 mod 17 

C = 00011 = 8 C = (D2 1 + (1)2° = 3 mod 17 

So 

/N A 

D = 2A + 2C + 4 = 2(10) +2(3) + 4 = 30 = 13 mod 17 

and, checking 

D = C + A = 8 +5 =13 mod 17. 
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If D can be expressed as 



D = 2 D + 2 mod 2 b + 1 (5.12) 

with D a b-bit number, then the b-bits of D are the 
b-LSB ' s of D [26] . 

Example: Let = 17 

and 

D = 01010 = 2 3 - 2 2 + 2 1 - 1 = 10 - 5 = 5 mod 17 

Since D can be expressed as: 

D = 2 D + 2 mod 17 

where 

D = 1010 = (1) 2 3 + ( 0 ) 2 2 + (1)2 + (0)1 = 8 + 2 = 10 

(check: D = 2 D + 2 = 2(10) + 2 = 22 = 5 mod 17) 

the 4 bits of D = 1010 are the 4 LSB's of D = 01010. 

There are two cases depending on the value of d^ in (5.9). 

1 - If d b = 1, then by [26] 

D = 2 b + D' = D' - 1 mod 2 b +l (5.13) 
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Example 



17 



Let F t = 



and 




A = 


01010 = 5 mod 17 


C = 


01011 = 7 mod 17 



one has 




A = (1) 2 3 + (0) 2 2 


+ (1) 2 1 + (0)1 = 8+2 = 10 


C =~ (1) 2 3 + ( 0 ) 2 2 


+ (D2 1 + (1)1 = 8 + 2 + 1 = 11 


D = ( 1 ) 2 4 + ( 0 ) 2 3 
Comparing with 


+ (1)2 2 + (0)2 1 + 1 = 21 = 4 mod 17 


A 

D = 

one verifies that 


d b 2b + d b-l 2b ' 1 + ••• + d o 




d b - 1 



in these conditions 



A 

D = 


2^ + D' mod 17 


where, by (5.13) 
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A 

D' = D + l = 4 + 1 = 5. 



Then 



D = 16 + 5 = 21 = 4 = 5-1 = 4 mod 17, 



checking (5.13). Thus, 



D = (2A + 2C + 4) mod 2 b +l = (2D+4) mod 2 b +l 
D = ( 2C ' +2) mod 2 b + 1 



(5.14) 



and 



C = C' 



(5.14a) 



Example: Let F = 17 

A = 01010 = 5 
C = 01011 = 7 
D = A+C = =12 



one has 



A = 10 

A 

C = 11 



D = A+C = 21 
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so 



(1) D=2(A+C) +4 mod 2+1 

D = 2(10+11) + 4 = 42 + 4 = 46 = 12 mod 17 

A 

(2) D = (2 D + 4) mod 17 

D = 2(21) +4 =46 = 12 mod 17 

(3) In the previous example D' = 5 
so 



D = 2(5) + 2 = 12 mod 17 

The condition D = D' , in (5.14a), can be verified: 



A = 01010 = 5 

+ 



= 01011 = 7 



D = A+C 




0101 = -8 +4-2+1= -5 =12 mod 17 



Since 



D = (0)2 3 + (1)2 2 + (0 ) 2 1 +1=5 mod 17 (by 5.12) 



then 



D = D ' = 5 mod 17. 
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2 - If d b = 0 in (5.9), then [26] 

D = 2D' + 4 mod 2 b +l 



(5.15) 



and 



D = D' + 1 



(5.16) 



Example: 



Let F = 17, b = 4 



A = 00 111 

C = 01 Oil 



16 mod 17 
7 mod 17 



00 010 

D = 00 010 = -8-4+2-1 = -11 = 6 mod 17 



Since by (5.12) 



D = (0 ) 2 3 + (0 ) 2 2 + (1)2 + (0) =2 



and by (5.16) 



D' 



= 1 



checking. 



D = 2 D' + 4 mod 2 b +l by (5.15) 

D = 2(1) + 4 = 6 mod 17. 
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Notice, that, this way of performing addition results in an 
extra level of add, as in the case of l's complement arithmetic. 

In l's complement arithmetic, the output carry is 
added to the LSB. 



carry, complements it and adds it to the LSB. That is, 
this new arithmetic is only as complex as the l's complement 
arithmetic. 

There is a small amount of additional complexity due 
to the control bit, but this acts only as an inhibit signal. 

Example : 

(1) 01111 = +8+4+2+1 = 15 mod 17 



In this new mod(2°+l) arithmetic, one takes the output 




00011 = -8-4+2+1 =-9=8 mod 17 
1)0010 



00010 = -8-4+2-1 = -11 = 6 mod 17 



( 2 ) 



01111 = 15 mod 17 



10000 = 0 

01111 = 15 mod 17 



MSB = 1 inhibits the addition 



(3) 



01011 = 8-4+2+1 = 7 mod 17 



(+) 00100 = -8+4-2-1 = -7 mod 17 




10000 = 0 mod 17 
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In this last example note that the second add automatically 
produced the control bit indicator, for the special case 
of zero. 

How does one convert from a binary coded representation 
of numbers to this new representation? 

The code conversion between a binary representation 
and this new code falls into two cases. 

Let M be a number which is represented in both codes. 

Let 



m, m. i ... m (5.16) 

b d— 1 o 

be the binary representation of M. 

Let 



m^ ... m Q (5.17) 

be the new representation. 

A 

Also, let M be the number represented by interpreting 
(5.17) as a binary code. The conversion rules are as 
follows : 

A 

1) If M = 0, then m fa = 1, M = 0, and xi^ = 0 for 
K = 0, 1, ... b-1. 

This is a special case and is done separately. 

2) If M ? 0, then in^ = 0 and 

M = (2M + 2) mod 2 b + 1 . (5.18) 
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Conversion from the new code to binary is implemented 

A r_ 

by forming 2M + 2 and comparing this sum to 2 . 

u u 

If the sum is larger than 2 ° , then 2 D +1 is subtracted 
to give the proper binary representation of M. 

Example : Let F = 17 

M = 01011 new code = 2 3 -2 2 +2+l = 7 mod 17 

/N 

M = 8 + 2 + 1 = 11 

By the above rule 

/\ 

2M + 2 = 2(11) + 2 = 24 = 7 mod 17 



and 



7 = 00111 binary representation of M. 

If the binary representation is given the sum 

M + 2 b - 1 (5.19) 

is formed. If the result is odd, 2 +1 is subtracted; and 
finally this result is right shifted one place. The resulting 
b bits are the b LSB's 



m b-l m b— 2 * * * m o * 
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Example: Let F fc = 17, b = 4. 

Given the binary representation of M = 11111 = 14 mod 17 
to obtain new code: 

(i) form 

M + 2 b -l = 14+16-1 = 29 

the result is odd then by the above rule. 

(ii) 

29-17 = 12 

(iii) 

12 

— = 6 00110 binary representation 

(iv) new code 4 LSB's of the binary obtained in 
(iii) / i.e., 

new code 00110 = -8+4+2-1 = -3 = 14 mod 17 

Notice that the code described so far, due to McClellan [26] , 
is a special case of a more general class of code trans- 
lations discussed by Leibowitz [27] . 

These translations involve the one-to-one representation 
of a number A in the ring of integers modulo F^, as the 
binary number corresponding to 

R A - 1 mod F (5.20) 

where R is any integer in the ring with an inverse [27] . 
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Notice that the code representation of McClellan [26] 
corresponds to the case of [27] 

r = 2 b_1 + 1 mod F t (5.21) 

Example: Let F fc = 2 b + 1 = 2 4 + 1, b = 4 

then 

R = 2 4 " 1 + 1 = 2 3 + 1 = 9 mod F 

Using (5.20) , for A = 5 mod 17 the code will be given by 
RA-1 = 9(5) - 1 = 44 = 10 mod 17 
10^^ = 01010 code 

checking, using (5.7) 



01010 = +8 -4 + 2-1 = 10-5 = 5 mod 17. 

Notice that the simplest code translation corresponds to 
R = 1, for any value of b. Leibowitz [27] concentrates on 
the resulting binary arithmetic for the code translation 
corresponding to R = 1. 

Recall that to represent all integers in the ring modulo 

F requires (b+1) bits. The additional bit is required in 

" ]d 

order to represent the number 2 = -1 mod F fc . 
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In order to overcome the problem of performing binary 
arithmetic with this additional bit, one allows the addi- 
tional bit to be a 1 only when the number to be represented 
is a zero. 

One way to do this is achieved by subtracting 1 from 
the normal binary representation of every integer in the 
ring and corresponds to the above set of code translations 
with R = 1 . 



Example 



Let F 



t 





17, b = 4 



Normal value 



Binary representation Diminished -1 

value (R = 1) 



9 (-8) 
IOC- 7) 
ll(-6) 
12 (-5) 
13 (-4) 
14 (-3) 
15 (-2) 
16 (-1) 



0 

1 

2 

3 

4 

5 

6 

7 

8 



00000 



00011 

00100 

00101 

00110 

00111 

01000 

01001 

01010 

01011 

01100 

01101 



00010 



00001 



01111 



OHIO 



10000 



1 

2 

3 

4 • 

5 

6 

7 

8 

9 (- 8 ) ( 5 . 22 ) 

10 (-7) 
ll(-6) 

12 (-5) 

13 (-4) 

14 (-3) 

15 (-2) 

16 (-1) 

0 



134 



In the diminished -1 number representation the b-least 
significant bits (LSB's) indicate the normal value of the 
number. The numbers from 1 to 2^ are represented in order 
by the binary numbers from 0 to 2^-1. 

Using this representation, the arithmetic operations 
necessary to perform convolution by FNT's, will be discussed 
next. 

1. Negation 

It can be seen from (5.22) that each of the negative 
numbers (>2^ ^ = 8) is the b-LSB's complement of its 
positive counterpart. 

Example : Let F^_ = 17 

Diminished -1 value Binary representation 

A = 6 00101 

-A = -6 = 11 01010 

Notice that if the MSB is 1, the negation is inhibited. 

Thus, the negative of a nonzero number in the diminished 
-1 representation is the complement of its b-LSB's. 

2 . Addition 

To perform addition of two numbers represented as 
(A- 1 ) and (B-l) 



(A-l) + (B-l) = (A+B-l) -1 
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and thus 



(A+B-l ) 



[ (A-l) + (B— 1) ] + 1 



(5.23) 



Since the (b+1; bit of the addends is used only to 



inhibit addition if an addend is zero, addition of nonzero 
addends involves only the b-LSB's. 



to the sum of two diminished (-1) numbers to provide a 
correct result. When a carry is generated from the b-bit 
sum, a residue reduction modulo F^ requires the subtraction 



necessary. Thus to add to numbers in diminished -1 
representation : 

1) If the MSB of either addend is 1, inhibit the 
addition and the remaining addend is the sum. 

2) If the MSB of both addends are 0, ignoring the MSB, 
add the b-LSB's, complement the carry and add it to 
the b-LSB's of the sum. 

The (b+1) bit or MSB of the resulting Siam is the carry 
out of the b*"* 1 bit. 

Example: Let F^ = 17, b = 4 

1. Diminished -1 value Binary representation 



Equation (5.23) indicates that a 1 must be added 



of a 1 since 2 



b 



-1 mod F fc and no corrective addition is 



add 



3 



00010 



+ 



2 




00001 

tXOOll 



5 



00100 
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2 . 



add 



7 



00110 




10000 

00110 



3. 



add 13 
+ 

9 



22 = 5 mod 17 



01100 

01000 

00100 

^0 

00100 



4. 



add 



11 



17 = 0 mod 17 



01010 

00101 

0iiii 

10000 



3 . Code Translation 

Let B represent the binary representation of a given 

number and D its diminished -1 representation. 

To perform code translation from binary to diminished 

-1 representation, one does a diminished -1 addition of B 

b 

and the binary representation of 2 - 1 [27] . 



Example : 



Let F = 17, b = 4. 



Binary number 



B 

2 b -l 



= 00101 

= 01111 

00100 



= 5 



Diminished -1 value D = 00100 
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i.e., the binary number B = 00101 = 5 is represented in 
diminished -1 representation by 

D = 00100 = (4) = (5-1). 



The translation from diminished -1 representation 
to binary representation, is performed by complementing the 
MSB of D and adding it to the b-LSB's [27]. 

Example : 



Diminished -1 
Binary 



D 

B 




= 00011 



3 



Example : 



D 

B 



Qooo 



000 
0 



= 00000 



0 Diminished -1 representation 
Binary representation. 



4 . Subtraction 

One can perform subtraction in the dimished -1 
arithmetic by negating the subtrahend and adding it to 
the minuend. 



Example: 

subtract 



Diminished -1 
8 

6 



2 



Binary representation 



00111 

01010 




00001 
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5. Multiplication by Powers of 2 

In performing a multiplication if the multiplier or 
multiplicand are 0, as detected by the presence of a 1 in 
the (b+l)th bit, the multiplication is inhibited and the 
product is zero. 

To perform multiplication of diminished -1 numbers 
by powers of 2, notice that: 

(A-l) 2 = ( 2 A- 1 ) - 1 



and thus 



(2A-1) = (A-l) 2 + 1 (5.24) 

therefore, each multiplication by two involves a left shift, 
ignoring the MSB, and a corrective addition of a 1. If the 

-l_l_ 

bit shifted out from the b position is a zero, it is com- 
plemented and shifted into the LSB in order to accomplish 
the addition of a 1. If this bit is a 1, a subtraction of 
1 is also required to accomplish a residue reduction (2 = -1) . 
With the corrective addition of +1, these cancel out and a 
0 is shifted into the LSB. 

Thus for each factor of 2, a left circular shift 
of the LSB's is required and the bit circulated into the 
LSB is complemented. 
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Example : 



17, b = 4 



Let F = 



16 x 9 = 2^x9 = 2x2x2x2x9 = 8 mod 17 



MSB 



9 

2x9 


oliooo 

dim 




4x9 


o;oooi 




8x9 


oloon 

j 




16 x 9 , 


010111 


8 mod 17 



6 . General Multiplication 

The last operation required to carry out convolution 
with the FNT is a general multiplication by any two 
integers modulo F^. 

To perform a multiplication of the numbers A and 
B represented as (A-l) and (B-l) in diminished -1 number 
representation system, notice that 



(A-l) (B-l) = A-B - (A+B) + 1 

= (A* B-l) - (A+B-l) + 1 



and the desired result is 

(A. B-l) = (A-l) (B-l) + (A+B-l) - 1 (5.25) 

thus, to carry out such an operation, ignore the MSB's, 
perform a binary multiplication of the diminished -1 
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representation of A and B, add this result to the b-LSB's 
of the diminished -1 addition of A and B and then perform a 
residue reduction by a diminished -1 subtraction of the 
b-MSB's from the b-LSB's. 

Notice that this particular general multiplication 
scheme is not applicable with code translations other than 
that corresponding to R = ±1. As discussed previously, 

if the MSB of either the multiplier or the multiplicand is 
1, then the multiplication is inhibited and the result is 
set to zero. 

Example: Let F t = 17/ b = 4. 

Multiply 

13 

x 

11 

.143 = 7 mod 17 



binary multiply 



diminished -1 add 



binary add 



01100 

01010 

011000 

0110000 

01111000 

00110 

01111110 



01100 

01010 




residue 1000 

reduction 



00110 7 mod 17. 
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An alternate multiplication technique can also be considered. 
Assuming that the numbers are determined to be nonzero and 
a multiplication is required, a translation to normal binary 
coding is performed. Following a binary multiplication, a 
residue reduction by diminished -1 subtraction of the b- 
MS's of the product from the b-LSB's is carried out. The 
result is the desired product. 

Let F fc = 17, b = 4 



7 mod 17 



translate 



translate 



Since in most cases the translation from diminished -1 to 
binary will be simpler and faster then a general binary 
or diminished -1 addition, the second technique is the most 
preferable . 

To conclude part A of this section, one can say that 
Leibowitz [27] presents a binary arithmetic applicable to 




7 mod 17 00110 



Example : 

13 

multiply 

x 

11 

143 = 
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the exact computation of the FNT, that this diminished -1 
representation is mathematically simpler than that of 
McClellan [26]. Also the hardware presented in [26] to 
compute the FNT, and discussed in part B of this section, 
can be applicable to this new technique with the exception 
of the code translation. 

B. SOFTWARE AND HARDWARE REALIZATIONS OF THE FNT 

The Fermat Number Transform has been proposed as an 
aid to the rigid and precise computation of convolution for 
digital filtering applications. Since this transform does 
not require multiplications, it is considerably faster 
than the FFT [17] . 

In what follows, a discussion of a software implementa- 
tion of the FNT, due to Agarwal and Burrus [4], as well as 
the description of a special purpose hardware to compute the 
FNT, realized by McClellan [26], is presented. 

In their assembly language realization of the FNT, on 

the IBM 370/155 computer, Agarwal and Burrus define a 

b t 

binary arithmetic modulo F = 2 D +1, b = 2 . 

Notice that the IBM 360/370 series uses a 32-bit word 

length for fixed-point arithmetic, and therefore is well 

suited for the implementation of convolution using the FNT 
32 

modulo F ^ = 2 +1. 

It has two's complement representation of negative 
integers, i.e.. 



(-x) is represented as 2 



32 



- x 



(5.26) 
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Since in arithmetic mod F one wants negative integers 

32 

to be represented as a complement of 2 + 1, i.e. 

32 

(-x) is to be represented as (2 +l-x) (5.27) 

One adds 1 whenever a negative number is encountered in 
data. 

As noted before with b bits, the quantity 2 = -1 mod F 

has no representation. Thus in the 370/155, one cannot 
32 

represent 2 = -1. If a (-1) is encountered in the data 

it is rounded either to 0 or to -2. If data is uncorre- 
lated, the probability that (-1) will appear after an 
arithmetic operation during the various stages of the FNT 
computation is roughly 

-32 ~ -10 

2 - 10 ■ LU (5.28) 



To add two 32-bit integers modulo F^, the logical add 
instruction (ALR) is used which adds the two integers and 
sets the condition code depending on carry or no carry. 

After the logical add instruction, a conditional branch is 
taken depending on the condition code. If the condition 
code indicates a carry bit in the logical add operation, one 
is subtracted from the result, otherwise it is left unchanged. 
This sequence of operations completes one addition modulo F,- . 
Similarly, subtraction modulo F 5 is performed using a 
logical subtraction (SLR) instruction followed by a conditional 
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branch instruction. To multiply a number by 



2 K mod F 5 , 0 < K < 32 

the number is loaded in the odd register of an even-odd 
fair of registers. The even register is filled with zeros 
and this double register is shifted left by K positions 
using a shift left double logical (SLDL) instruction. 

After the shift operation, the even register is subtracted 
from the odd register, modulo F^. This sequence of instruc- 
tions completes multiplication by 2 mod F^. 

To multiply a number by 

2~ K mod F 5 - , 0 < K < 32 , 

the number is loaded in the even register of an even-odd 
pair of registers, the odd register is filled with zeros 
and this double register is shifted right by K positions 
using a shift right double logical (SRDL) instruction. 

After the shift operation, the odd register is subtracted 
from the even register, modulo F 5 * This sequence of 
operations completes multiplication by 2 mod F^. To 
multiply two numbers mod F^ requires a somewhat larger 
number of operations. First one determins the signs of 
the two integers and multiply the signs. If any of the 
numbers is detected as negative, its absolute value mod F^ 
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is taken by taking its two's complement and adding one to 
it. Their absolute values are multiplied (MR) to obtain 
a 64-bit double-register product, then the even register 
(higher order 32 bits) is subtracted from the odd register 
(lower order 32 bits) mod F^. Finally, if the product of 
signs was detected negative, one multiplies the result by 
(-1) by taking its two's complement and adding one to it 
[5.27] . 

Assembler subprograms were written [4] to compute Fast 

Fermat Number Transforms and inverse Fermat number transforms 

1 6 

for an sequence length from 2 to 2 , taking x as a power of 
2. A decimation in frequency algorithm with normally 
ordered input and bit reversed output was used for the 
computation of the Fast Fermat Number Transform [4] . A 
decimation in time algorithm with bit reversed input and 
normally ordered output was used for the computation of the 
fast inverse FNT. 

For both of these subprograms, the only multiplications 
required were by a power of 2, which were implemented as 
discussed above. Two more subprograms were written to 
compute fast FNT's and inverse FNT's for length - 128 
sequences, using a = / 2 given by (4.16). 

Multiplications required to multiply the two transforms 
are general multiplications mod F^ and are performed as 
discussed earlier. 

To cyclically convolve sequences longer than 128, two- 
dimensional convolution schemes were used. 
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One such scheme was 2 by 128 convolutions for length 
256 sequences discussed in section 10 of [18] . Another 
program was written to convolve long one-dimensional 
sequences using two-dimensional FNT, as discussed in 
Appendix B. In Section 6, results of the comparison with 
the FFT are presented. 

The special attributes of the FNT led several researchers 
to consider seriously special-purpose hardware implementa- 
tions of the transform. The machine constructed by McClellan 

[26] , is an implementation of a 64-point/ 16-bit FNT 
1 6 

(modulo F^ = 2 +1). This hardware system applies, the 

second coding scheme, described in part A of this section, 
to the logic design of the butterfly of the FNT algorithm. 

The fast FNT algorithm implemented is a radix-2 constant 
geometry decimation in frequency (DIF) decomposition of 
the FNT. 

Fig. 1 shows a flow diagram of this algorithm for a 
16-point transform [38] . Although the constant geometry 
structure does not allow in place calculation of the trans- 
form, it simplifies the memory addressing because the addressing 
does not vary from stage to stage. The price one pays for 
this simplification is a doubling of the memory size 
required for the transform. However, 128 words per chip is 
a convenient level of integration with ECL (emiter coupled . 
logic) , thus making the constant geometry structure 
attractive [26] . 
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FIGURE 1. Flow Diagram of a Radix-2, 16 Point, Constant 
Geometry FFT Algorithm Using the Decimation 
in Frequency Structure 
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Fig. 2 is a block diagram of the complete system showing 
the four major subsystems. The computational element (CE) 
is a radix-2 DIF butterfly for the fast FNT algorithm; the 
memory element contains 128 x 17-bit words for use as inter- 
mediate storage during the computation of the transform; 
the control element is a hardwired implementation of the 
fast FNT algorithm [26] ; and the input/output (I/O) section 
provides the interface with the fast digital processor (FDP) . 
McClellan states that the goal in building this hardware was 
to construct a CE that would operate at a clock rate of 40 
MHz. In order to achieve this speed, ECL 10K circuits were 
used. The basic gate in this logic family as a propagation 
delay of 2 ns, and thus these circuits are well-suited for 
very high-speed systems. Even with such high speed logic 
circuits, two levels of reclock and fast carry addition were 
used in the CE to realize a working system that runs 
reliably at 38 MHz [26J. 

Fig. 3 shows a functional diagram of the butterfly 
which consists of an adder, a subtracter, a rotator, input 
buffer registers R A and Rg, reclock registers R^ R x and 
Ry and an output register R^,. Register transfers are made 
at each clock pulse, so that data are always flowing through 
the CE as would be the case in a pipelined fast FNT. 

As the timing diagram in Fig. 4 shows, the output of 
the butterfly is only written into memory from R z at t 4 and 
t 5 - During the other clock epochs the contents of R z may 
be changing but this does not affect the algorithm. 
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FIGURE 2. Block Diagram of the 64-Point FNT Hardware 
System Illustrating the Data Flow Between 
the Major Subsystems 
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Register Transfer Diagram of the CE of the 
FNT Algorithm. Path on the Left Implements 

A+B and the Path on the Right (A-B) 2 K 
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FIGURE 4 . Timing Diagram for the Internal Clocking 
Operation of the FNT Butterfly. Register 
Transfers at Each Clock Pulse are Also Shown 
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Recall that in McClellan's code, the rule for addition of 



two nonzero numbers , 



A 



[a 16 a 15 




and 



B = [b^g ... b Q ] is as follows: 

STEP1: Add the 16 LSB's of A and B with the 
carry in equal to zero 

STEP2 : Complement the carry out from step 1 and 

add it to the sum of step 1. 



carry must be inhibited. Finally if both A and B are zero 
the MSB of the sum is set to one. Fig. 5 shows a realization 
of the addition process. The structure of Fig. 5 is ineffi- 
cient in two respects. First of all, two 16-bit adders are 
required, although the second one is simple because one 
input is zero. Secondly the addition is very slow because 
the carry must propagate through the 16-bit adders. The 
use of carry look ahead (CLA) logic as in Fig. 6 will 
improve both situations. For details of the implementation 
using ECL building blocks see McClellan's paper [26]. 

Notice that the subtracter A-B, can be implemented 
by complementing B and adding it to A. 



If either A or B is zero (i.e., b^g 



or a 



= 0) , then the 
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FIGURE 5. Implementation of Addition Modulo the Fermat 
Number (2^+1) Using Two 16-Bit Adders and 
the New Coding Scheme 
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. Improvement of the Fermat Number Adder by 
the Introduction of CLA Logic to Produce 
the. End Around Carry Required in Fig. 5 
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The addition A+B completes the calculation of one output 
of the computational element (see Fig. 3) . The result is 
held in the register and then is moved to R z to be written 
back into memory. For the other output of the CE, the 
quantity A-B is stored in the reclock register R for 
subsequent rotation by a power of 

a = /2 given by (4.16), . 

a = /2 = 2 b//4 (2 b//2 -l) = 2 16y/4 (2 16//2 -l) = 2 4 (2 8 -l) = 2 12 -2 4 

(5.28) 

The rotation by ^2^ is split into two stages. 

In the first stage, the quantity X = A-B is multiplied 

_ 12 4 ' _ 

by /2 = 2 - 2 if K is odd. The v2 multiplier is merely 

a subtracter (5.28). 

A 2:1 multiplexer at the output of the subtracter 

selects whether the input is to be multiplied by /2 or 

by 1, and is controlled by the LSB of K [26] . The result 

of this calculation is stored in the reclock register R^. 

The second stage of the rotation is a multiplication by a 

power of 2, namely [K/2] . ([ ] denotes the greatest 

integer function.) This multiplication is implemented as 

a 16:1 multiplexer controlled by the upper four bits of 

the binary representation of the power of 2. The shifting 

network is followed by a 2:1 multiplexer which selects 

K 

which butterfly output (A+B or 2 *y) is to be stored in 
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R z and then written back into memory. This multiplexer 
is controlled by t 3 and its output is 

(2 K • Y) F 3 + W • t 3 

where Y and W are the contents of R^ and R w respectively. 

McClellan states that the logic for the computational 
element consists of 90 IC's, all of which are located on 
one board (18x16 in.). 

In Section VI, a comparison between the complexity of 
various basic operations involved in computing Fermat 
Number Transforms vis-a-vis the FFT, in software as well as 
hardware implementations, will be presented. The software 
and hardware implementations discussed there will be those 
described in this section. The results will show only the 
efficiency of these implementations, not all the possibili- 
ties of NTT in general. 
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VI. FERMAT NUMBER TRANSFORM VERSUS FFT 



The FNT provides an efficient and error-free means of 
computing cyclic convolutions. The purpose of this section 

is to compare this method with the standard implementation 

' ! 

of convolution. Results obtained both by Agarwal and 
McClellan, respectively in software and hardware implemen- 
tation of the FNT, are presented. 

Computation of the FNT of length N requires on the order 
of N log 2 N additions, bit shifts and subtractions but no 
multiplications. The only multiplications required for 
our FNT implementation of cyclic convolution are the N 
multiplications required to multiply the transforms. This 
is a very efficient technique for computing convolution, 
but unfortunately, the maximum transform length for an FNT 
is proportional to the word length of the machine used. 

Agarwal and Burrus [ 17 ] showed that a very practical choice 

32 

of a Fermat number for this application is F^ = 2 +1, and 

that the FNT mod F^ can be implemented on a 32-bit machine, 
namely the IBM 360/370 series computers. 

Suppose one wants to calculate the convolution of two 
sequences x(n) and h(n) having b-j^ and b 2 bit representations, 
respectively, and that the sequence length of both is N. 

Then the output y (n) , given by (3.2), would have at the 
most [17] a 



b l + b 2 + log 2 N 



( 6 . 1 ) 
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bit representation. To obtain the correct result b, the 
number of bits of the output, should be [17] 

b 1 b ! + b 2 + log 2 N (6.2) 

Notice that a better bound on the output can be achieved 
[4] . 

Roughly speaking, one needs twice the number of bits to 
carry out the convolution using the FNT as compared to the 
fixed point FFT implementation of the convolution. But in 
the DFT, every data point is treated as a complex number 
[17] and therefore requires two words, one for the real part 
and one for the imaginary part. Thus in effect the hard- 
ware requirement for two transforms are about the same. 
Although for real data it is possible to make use of the 
symmetry properties of the DFT, they require extra computa- 
tion and for the purpose of comparison it will be ignored, 
although Agarwal and Burrus had taken this into account for 
their IBM 370/155 implementation. In fact, they assumed, 
in the FFT implementation, each data point is represented 
by a b/2 bit real part and a b/2 bit imaginary part. One 
b/2 bit complex addition is equivalent [17] to two b/2 bit 
real additions, which are equivalent to a b-bit addition 
modulo F fc . Thus the complexity of addition/subtraction is 
the same in both the transforms. Similarly, it can be 
shown [4] that a b/2 bit complex multiplication is equivalent 
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Computation of the 



to a b-bit multiplication modulo F^. 

FNT requires multiplications by a power of 2, which 
implemented as bit shifts and subtractions become much 
simpler operations compared to complex multiplications 
required in the FFT implementation. 

To compute a length N fast FNT, N log 2 N additions/ 
subtractions, and (N/2) log 2 (N/2) "multiplications" by some 
powers of 2 which are implemented as bit shifts and sub- 
tractions. To compute the convolution using the FFT, most 
of the time is taken in computing the complex multiplica- 
tions required to compute the complex multiplications 
required to compute the transforms. A comparison with the 
FNT reveals that these complex multiplications are replaced 
by bit shifts and subtractions which are much faster opera- 
tions. This results in considerable computational savings 
in the implementation of convolution. 

The convolution required to multiply the two transforms 
is about the same for both the implementations. 

To convolve long sequences using the two-dimensional 
FNT (Appendix B) , the computational effort increases by, 
at most, a factor of 2 [4]. Still, the FNT implementation 
of convolution is much faster as compared to the FFT 
implementation. 

Fermat number transforms have some additional advantages 
over the FFT. First, the FFT implementation requires storing 
all the powers of a (see 3.7 ) requiring a significant amount 
of storage which may be an important factor for a small 
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minicomputer or a special purpose hardware implementation. 
Second, fixed-point FFT implementation introduces a signifi- 
cant amount of round-off noise at the output, 6-8 bits 
depending on the data [39] . This degrades the signal- 
to-noise ratio during the filtering operations. The FNT 
implementation is error free, the only source of error is 
input A/D quantization. 

Timings for various implementations of convolution and 
their comparison with the FFT implementation are shown by 
Agarwal and Burrus [4] ; 



N 


FFT 

msec 


FNT 

msec 


32 


16 


3.3 




64 


31 


7.4 




128 


60 


16.6 


* 


256 


123 


40.0 


** 


256 


123 


80.0 


* * * 


512 


245 


166.0 


* ** 


1024 


530 


340.0 


*** 


2048 


1260 


720.0 


*** 



* using a = /2 

** using 2 by 128 convolution 

*** using the two-dimensional FNT 



To compute these timings it is assumed that the transform 
of the h sequence has been precalculated. Thus timings 
are for computing x transforms, multiplications of the 
transforms, and their inverse transforms [4]. 
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For cyclic convolution lengths up to 256, the FNT 
implementation of convolution have a factor of 3 to 5 
speed advantage over the FFT implementation. It takes 
roughly 13-14 usee to compute one butterfly in the compu- 
tation of fast Fermat number transforms [4] . Timings for 
the computation of fast Fermat number transforms, are fairly 
well modeled by multiplying this time by the number of 
butterflies required in the computation. An assembler 
subprogram was written to compute one butterfly for the FFT 
algorithm [4]. The timing for this was 68 usee, and this 
explains the difference in the timings of the FFT and FNT. 
Agarwal claims that since assembler subprograms were not 
optimized for time, it should be possible to further reduce 
their timings by 10 to 20 percent. Also, to compute one 
butterfly of the fast FNT's, three add/subtract logical 

instructions are required and since the arithmetic is done 
32 

mod 2 + 1 , these three instructions are followed by three 

branch instructions (to take into account the carry bit) . 

32 

If the hardware was designed to do arithmetic mod 2 +1, 

these instructions could have been avoided resulting in a 
significant reduction in the computation time [4] . One 
objective of the construction of FNT convolver by McClellan 
was to evaluate the total system cost of an FNT convolver 
versus a pipeline FFT convolver, primarily in the speed 
regime applicable to radar signal processing. 

The signal bandwidths encountered in radar signal 
processing (10-30 MHz) require a pipeline architecture for 
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either the FNT or FFT [40] . Typically, the overlap-save 
version of high-speed convolution is employed for real-time 
processing of the radar returns and a 50 percent overlap of 
the input data is common [26], Furthermore, the length of 
the convolution to be implemented is assumed to be large 
(e.g., 512 or greater). Two cases will be considered: 
a length 1024 convolution of real data and a length 1024 
convolution of complex data. 

Four measures of hardware complexity are the basis of 
comparison [26] : 

the number of butterflies for output point, 

- the number of reference spectrum multiplies for 
output point, 

- the total amount of interstage delay line memory in 
the forward and inverse transforms, 

and the total amount of reference spectrum memory. 

The FFT implementation will be considered first. For either 
real or complex data, it is assumed that the FFT implemen- 
tation employs an 11-stage radix 2 pipeline FFT in both the 
forward and inverse transform. Notice that for real data, 
it is possible to do a length N transform with one length 
N/2 transform and some, overhead to combine the real and 
imaginary parts [38] . However the overhead amounts to an 
additional butterfly so there is little, if anything to gain 
by using this fact in a pipeline FFT [26] . 

For the case of a length 2 11 = 2048 pipeline FFT convolver 
the number of butterflies per output point is 
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2 log 2 N 



22 



(6.4) 



assuming 50 percent convolution overlap. Likewise, two 
reference spectrum multiplies must be done per output point 
[26] . The amount of interstage delay line memory can be 
calculated from the formula [26] 

I DM = | (r+1) (6.5) 

where r is the radix of the transform. 

Thus, for two radix-2 pipelines, the total is 

I DM = |(2 + l)-2 = 3N = 6144 = 6K 

words of memory. • Finally, the reference spectrum requires 

2K words of memory [26] . A pipeline FNT structure is 

identical to the pipeline FFT except in the butterfly where 

rotation by e ^- 2lTK / N (i n the FFT case) is replaced by multi- 
\/“K 

plication by v 2 . 

Thus many of the results quoted above are applicable to 
the FNT. Since the FNT naturally processes real input data, 
the cases of real and complex convolution require different 
realizations. In both cases, however, a two-dimensional 
implementation of the convolution is required [31] . 

The length 1024 convolution of real signals can be 
implemented with 64 x 64 transforms [26] . 



McClellan claims that for the FNT convolver processing 
real data, the following results apply: 

12 

total number of butterflies = 9 x 2 36 butterflies per 

output point 

12 

total reference function multiplies =2 4 multiplies per 

output point 

total interstage delay memeory = 6*4 K 

amount of reference function memory = 4K. 

If the signal to be convolved is complex then one possible 
implementation is to handle the real and imaginary parts of 
data separately. The number of butterflies per output 
point, the interstage delay memory, and the reference spec- 
trum are all doubled. However, the number of real multipliers 
for the reference function multiply is quadrupled because 
the multiplications are now complex [26] . These results 
can be summarized, assuming 1024 convolutions, by 





FFT Real or 
Complex Data 


FNT Real 
Data 


FNT Complex 
Data 


Butterflies 


22 


36 


72 


Reference 

Multipliers 


2 


2 


16 


Interstage 
Delay Memory 


6K 

complex words a 


6 • 4K 

real words 


12-8 K 

k real words 


Reference 


2K 


4K 


8K 


Spectrum 

Memory 


complex words 


real words 


real words 



a One complex word will contain approximately 27 bits 
for typical high-precision radar applications. 

b One real word contains 33 bits for FNT. 
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Notice that the FNT always requires more memory and more 
computational elements than the FFT. Hardware savings are 
possible because most of the hardware cost of the FFT is 
concentrated in the butterfly elements (up to 80 percent) 
[26] and because the FNT butterfly requires from one-third 
to one-sixth the hardware of an FFT butterfly. These 
remarks apply to the FNT where the data to be convolved are 
real, but where the data are complex, the situation becomes 
much worse because all measures of hardware complexity are 
increased by a factor of 2 or 4. 

McClellan conclude that the FNT is a useful alternative 
to the FFT if the signal to be filtered is real and the 
computational elements are a major part of the overall 
system cost as in a pipeline architecture. Furthermore, 
for short length convolution (e.g., length 64) and for 
two-dimensional convolution the savings may be significant. 



i 
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VII . CONCLUSIONS 



It has been shown that, by working in a finite field or 
ring of integers modulo M, a large class of transforms 
exist that have the cyclic convolution property, i.e., the 
transform of cyclic convolution of two sequences is equal 
to the product of their transforms. These transforms are 
called Number Theoretic Transforms (NTT's) and they are a 
computationally efficient approach to performing the dis- 
crete convolution function. These NTT's are truly digital 
transforms, taking into account the quantization in amplitude 
and the finite precision of digital signals. They bear the 
same relation to digital signals as the DFT does to discrete- 
time or sample data signals and the Fourier or Laplace 
transforms do to continuous time signals. 

When working with digital machines, the data are avail- 
able only with some finite precision, and therefore, without 
loss of generality, the data can be considered to be inte- 
gers with some upper bound. To compute convolution in this 
digital domain, operations in the complex number field of 
the continuous domain can be imitated in a finite field, or 
more generally, in a finite ring of integers under additions 
and multiplications modulo some integer M, with an integer 
a of order N, replacing e ”? * 27r ^ N in the Discrete Fourier 
Transform (DFT) . 

In this ring, when two integer sequences x(n) and h(n) 
are convolved, the output integer sequence y(n) is congruent 
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to the conventional convolution of x(n) and h(n), modulo M. 

In the ring of integers modulo M, conventional integers 
can be, unambiguously, represented if their absolute value 
is less than M/2. If the input integer sequence x(n) and 
the filter sequence h(n) are so scaled that |y(n) | never 
exceeds M/2, one would get the same results by implementing 
convolution in the ring of integers mod M as that obtained 
with normal arithmetic. 

By special choices of the length N, the mod M, and 
the value a, it is possible to have transforms that need 
only word shifts and additions but no multiplications, 
that have an FFT type algorithm, that do not require storage 
of complex values of a and that have no round-off errors. 

It has been shown that Mersenne transforms with 
M — p = 2^-1, q a prime, and a = -2 , have the transform 
length equal to N = 2q and therefore do not have an FFT 
type fast computational algorithm. 

The best known number theoretic transform is the 

2 *" . 

Fermat Number Transform (FNT) , where M=2 + 1, t a positive 

integer. For FNT's with a prime or composite modulus it 
was verified that a = 2 or a power of 2 is possible, for 
sequences up to N = 2 t+1 . This is a very desirable situa- 
tion since N is highly composite allowing an FFT type 
algorithm and all multiplications by powers of a are simple 
word shifts. If a = /2 is used then sequences of length 
N = 2 t+ ^ are possible, thus increasing the maximum sequence 
length permissible. 
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Assembler programs on the IBM 370/155 computer, written 
by Agarwal, showed that for cyclic convolutions length 
up to 256, the FNT implementations of convolution have a 
factor of 3 to 5 speed advantage over the FFT implementa- 
tions. The reasons for the speed up are: 

1 - The Fermat number transform requires no multiplica- 

tions and, therefore, the implementation of 
convolution requires only N multiplications for 
an N point convolution. The number of additions 
and subtractions (together) for a convolution is 
2N log 2 N and there are N log 2 N required "multipli- 
cations" by a power of two. 

2 - Only real operations are required. This buys about 

two to one savings over the FFT requirements. 

3 - The Fermat number transform is able to compute an 

exact convolution thus allowing a program to avoid 
the need for either floating point arithmetic or 
overflow checks or other precautions. 

The computation required to multiply the two transforms is 
about the same for both implementations. To convolve long 
sequences using two dimensional FNT, the computational effort 
increases by, at most, a factor of 2. Still the FNT imple- 
mentation of convolution is much faster as compared to the 
FFT implementation. 

Fermat number transforms have some additional advantages 
over the FFT. First, the FFT implementation requires storing 
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all powers of a requiring a significant amount of storage 
which may be an important factor for a small minicomputer 
or a special purpose hardware implementation. 

Second, fixed-point FFT implementation introduces a 
significant amount of round-off noise at the output 6-8 
bits depending on the data [38] . This degrades the signal- 
to-noise ratio during the filtering operations. The FNT 
is error free, the only source of error is input A/D 
quantization. 

In the realm of radar signal processing the potential 
for higher throughput is worth exploring. For this reason 
McClellan designed and constructed a small prototype FNT- 
convolver. An important element in the design of the FNT- 
convolver was a new coding scheme for the data, although 
simpler codes are possible. The experience derived from 
designing and building this hardware serves as the basis for 
estimates of the site of large pipeline FNT convolvers, for 
use in radar matched filtering applications. The result of 
the hardware comparison of the FNT versus a pipeline FFT, 
indicates that the anticipated savings of the FNT can be 
realized for small systems (e.g. , length 64 convolution) , 
when the signal to be filtered is real. However, in larger 
systems where one must use two-dimensional Convolution to 
implement one dimensional convolution, the savings in 
multiplier hardware are offset by increased transform size 
and the corresponding increase in memory size and reference 
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spectrum multiplier hardware. In this case, when the signal 
to be filtered is real, the FNT still offers a potential 
savings in the amount of hardware versus the FFT. When the 
signal is complex valued the amount of FNT hardware approxi- 
mately doubles and is much greater than the pipeline FFT. 

In the implementation of two-dimensional convolution there 
is no penalty due to increased memory size and the savings 
in multiplier hardware will translate into savings for the 
overall convolver system. 

It was shown that the main drawbacks of' FNT' s is a 
rigid relationship between word length and the obtainable 
transform length and a limited choice of possible word lengths. 
This last point is especially significant, since FNT's are 
restricted to word sizes equal to q = 2^, t an integer. As 
q increases very rapidly with t, the choice of possible word 
lengths is very limited, and most practical digital filtering 
applications, when implemented with FNT, are constrained to 
word lengths of 16, 32 or 64 bits. If the dynamic range 
required for convolution does not correspond to q = 2^ 
choosing the next higher value of q may result in a signifi- 
cant waste of computing power. 

Various solutions to these problems involving either 
two-dimensional techniques, the use of "the Chinese Remainder 
Theroem" , or other NTT's has been discussed. 

Along these lines, it was shown that transforms over 
the Galois Field GF(p 2 ), can be found which do not introduce 
round— off errors and which can be used to compute 
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information-lossless convolutions of sequences of complex 
numbers, by a FFT type algorithm. A disadvantage of this 
transform is that multiplications by powers of the primitive 
element (a) is not as simple as in Mersenne or FNT's. 

It was verified that a Fourier-like transform in GF (p) 
where p is a prime of the form A r = 3x2 n +l, n a positive 
integer, is possible. This transform increases the variety 
of methods and the digital word lengths that can be used 
for computing the convolution of integers beyond the above 
mentioned Fermat or Mersenne Number Transforms. Also, a 
special NTT that can be computed using a high-radix fast 
Fourier type algorithm, defined on arithmetic modulo primes 
of the form (2 n -l)2 n +l, was discussed. 

Complex Mersenne transforms that. can be computed without 
multiplications were presented. These transforms are very 
promising for computing convolutions because they can be 
partly computed with the FFT type algorithms and some of 
the operations can be performed on words of reduced length. 
Complex Mersenne transforms also have the advantage of 
permitting operations on transform lengths and convolution 
lengths up to four times longer than is possible with 
conventional Mersenne transforms. 

These results have been extended to cover the case of 
transforms oerating in a ring modulo a pseudo Mersenne 
number or submultiple of such a number. It was verified that 
some of these transforms have a highly composite transform 
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length and therefore can be computed with an efficient 
FFT-type algorithm. In the same way pseudo FNT's defined 
in a ring which is a submultiple of a Fermat number and 
can be considered as a generalization of FNT's. allow a much 
wider choice of possible word lengths, and therefore are 
well adapted for evaluating convolutions. As an extension 
the case of complex transforms was considered. 

Finally, complex pseudo FNT's were presented, that 
allow a length double that of FNT's, and part of the calcu- 
lations to be performed on words of reduced length. Some 
of these transforms have a highly composite number of terms 
and are therefore well suited for computing complex convo- 
lutions with an efficient FFT-type algorithm. 

Recently 132] a number of three bit primes have been 
discovered which make possible very efficient fast number 
transforms approaching that of the FNT, but permitting much 
larger transform lengths suitable for convolving the large 
arrays met in picture processing and electron microscopy in 
particular. If a method of implementing arithmetic modulo 
three bit primes comparable to that already developed for 
implementing arithmetic modulo a Fermat number, could be 
found, very efficient fast convolution would become possible 
for a very large range of array sizes. 

Rader and Brenner [41] have introduced an alternative 
form for the FFT. This new form has the advantage that none 
of the multipliers is complex, but in the usual complex 
field, most are pure imaginary. It has the disadvantage 
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that one must divide by very small numbers and therefore 
aggravate any quantization noise problems. This new 
algorithm may be applied to the complex NTT without this 
disadvantage. 

Winograd [42] has shown how to perform a short-length 

D.F.T. of length N = p or p r (where p is a prime) in a very 

efficient way. Winograd' s algorithm uses the fact that 
N 

a = 1 and requires that all operations be performed in a 
field. Hence, providing one chooses the modulus M to be a 
prime number, one may perform the NTT by using Winograd 's 
algorithm. 

A final remark is in order. Whether or not an engineering 
method is useful becomes clear only when that method is 
evaluated by the wide community of potential users, who 
consider it in relation to their needs. Since so many 
engineers and programmers are not familiar with number . 
theory it is an open question whether NTT's algorithms will 
ultimately prove to be of great or small importance. 
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APPENDIX A 



TWO DIMENSIONAL CONVOLUTION FOR CONVOLVING LONG SEQUENCES 

Arithmetic mod F (a Fermat number) can be implemented 
using b = bit representation of integers. In Section IV, 
it has been shown that the maximum length of sequences which 
can be cycled convolved using the FNT with a = 2 is N = 2b 
and therefore the length of sequences which can be convolved 
is proportional to the word length in bits. Thus for long 
sequences the word length requirement may be excessive. 

Rader [3] suggested using a two dimensional convolution 
scheme to convolve long one-dimensional sequences and Agarwal 
and Burrus [17,18] presented such a two dimensional convolu- 
tion scheme. Using this scheme, cyclic convolution of 
length N = LP is implemented as a two dimensional cyclic 
convolution of length 2LxP. 

This two-dimensional cyclic convolution can be implemented 

using a two dimensional FNT. Using this two dimensional 

scheme, the word length required is proportional to the 

square root of the length of the sequences to be convolved 

2 

which would give for a maximum sequence length 8b rather 
than 4b. I.e., for a computer word length b = 64 

N for a = /2 would be 32768 1 

In the following pages an example illustrative of the two 
dimensional convolution using arithmetic modulo a Fermat 
number and FNT's is worked out. 
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Let x(n) and h(n), n = 0, 1, ... N-l, be two sequences 
which need to be cyclically convolved. Let N = LP. 

We construct two (2IxP) two-dimensional sequences 
h(i,j) and x(K,Jl) from x(n) and h(n) respectively as 
shown below. 



h(i,j) = h ( jL + i - L) (1) 

i = 0/1/ .../ 2 L— 1 
j = 0,1, . . . , p-1. 



And 



x ( £L+K) 

x(K,£) 



0 



K = 0,1, . . . , L— 1 



( 2 ) 



K = L, L+l , . . . , 2L-1 
Si = 0,1, . . . , p-1 



Example: 

Two dimensional convolution using FNT 
Let 

x (n) = (2, -2, 1,0) and h(n) = 

Using Fermat number transforms modulo F : 
x (n) = (2,15,1,0) and h(n) - 



s. 



( 1 , 2 , 0 , 0 ) 



17, 



( 1 , 2 , 0 , 0 ) 
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Note that the second element is changed from -2 to 15. 
Notice that N = 4 = Lx P = 2x2. 

One constructs two (2L x p) two dimensional sequences 
h(i,j) and x(K,£) from x(n) and h(n), according to (1) 
and (2) . 

It turns out 





0 1 




2 1 




0 2 




15 0 


h(i,j) = 




x (K, £) = 






1 0 




0 0 




2 0 




0 0 



Now taking two dimensional transforms of h and x yields H 
and X. 



p-1 2L-1 

H(m,ri) = l l h(i,j)a 2 ^ m a ^ n 
j=0 i=0 



(3) 



m = 0,1/ . . . , 2L-1 

n 0,1, ..., p™ 1 




is an element of order 2L 
is an element of order p 



and a similar expression for x(m,n) . 

Remembering that one is using F = 17 as the modulus 



a 



2L 



a , 




= 13 



and 



16 



This can be found by noting that 3 and 6 primitive roots 
of the ring in consideration will generate all the positive 
integers in the ring (2.16). 

Also, by (2.20), if a is a root of order N then 

a is of order N/K if K|N (K divides N) 
a is of order N if N and K are relatively prime. 

Thus, with these considerations in mind one finds 

a 2L = a 4 = 316/4 = 3 4 = 13 (mod 17) 

a = 13 order 4 

a = a 9 = 3 16 / 2 = 3 8 = 16 (mod 17 ) 

P ~ 

a = 16 order 2 



Now one can return to the calculation of the two dimensional 
transforms. The first term will be 



H ( 0 , 0 ) = h (0 , 0 ) 13 
+ h (1 , 0 ) 13 
+ h (2 , 0 ) 13 
+ h (3 , 0 ) 13 



( 0 ) ( 0 ) 
( 1 ) ( 0 ) 
( 2 ) ( 0 ) 
(3) (0) 



16 ( 0 ) ( 0 ) 
16 ( 0 ) ( 0 ) 
16 ( 0 ) ( 0 ) 
16 ( 0 ) ( 0 ) 



+ h (0 , 1) 13 
+ h (1 , 1) 13 
+ h (2 r 1) 13 
+ h(3 ,1) 13 



( 0 ) ( 0 ) 
( 1 ) ( 0 ) 
( 2 ) ( 0 ) 
(3) (0) 



16 (1 ) ( °) 
1 6 (D(°) 

16 (1) (0) 
16 (1)(0) . 
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X • G • / 



H (0 , 0 ) 



= h (0 , 0 ) + h (0 , 1) + 
+ h (1, 0) + h(l,l) + 
+ h (2 , 0) + h (2 , 1) + 
+ h (3 , 0) + h(3,,l) 



0 + 1 + = 6 (mod 17) 



0 + 2 + 



1 + 0 + 
2 + 0 



Now constructing a table that will help in the evaluation of 
M(m,n) . 

N = 0 1 2 3 4 5 6 7 8 9 

13 N = 1 13 16 4 1 13 16 4 1 13 

\ , / 



notice order 4. 

% 

N = 0 1 2 3 4 5 6 7 8 9 




notice order 2 



Now proceding: 



H (1 , 0 ) = h(2,0) 13 (2)(1) 16 (0)(0) +h(0,l) 13 (0)(1) 



+ h(3 r 0) 13 (3)(1) 16 (0)(0) + h(l,l) 13 (1)(1) 



since h(0,0) = h(l,0) = h(2,l) = h(3,l) = 0. 



16 (1 ) ( °) 
16 (1)(0) 
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H(1,0) = 1 13^ 2) 16^ 0) + 1 13*°* 16* 0) 



16 + 1 = 0 mod 17 



For 

H (2 , 0) 



and, 

H ( 3 , 0) 



+ = 



2 13 



(3) 



16 



(0) 



+ 2 13 



16 



(0) 



8+26 



= h(2,0)' 13 (2) (2) ig (°) (°) + h(0,l) i3 ( 2 ) 16 (D (°) 

+ h (3, 0) i3 ( 3 ) ( 2 ) i6 (0)(0) + h (i r i) 13 (2) (2) lg (l)(0) 



= 1 


13 (4 > 


16 (0 » 


+ 1 13 (0) 


16 


(0) = 1 + 1 + =15 (mod 17) 


+ 2 


1 3 (6) 


16 (0 » 


+ 2 13 (2) 


16 


(0) +32+32 



= h ( 2 , 0 ) 13 (2)(3) 16 (0)(0) + h ( 0 , 1) 13 (0)(3) 16 (1)(0) 

+ h ( 3 , 0) 13 (2)(3) 16 (4)(0) +h(l,l) 13 (1)(3) 16 (1)(0) 

= 1 13 (6) 16 (0) + 1 13° 16° = 16+1 = 0 (mod 17) 

+ 2 13 9 16° + 2 13 3 16° + 26+8 



For the second column 



H(0,1) 



H ( 1 , 1 ) 



H ( 2 , 1 ) 



H (.3 , 1 ) 



+ 



h (2, 0) 13 
h (3, 0) 13 



( 2 ) ( 0 ) 

(3) (0) 



16 (0)(1) 

16 (0)(1) 



+ h(0,l) 13 
+ h (1,1) 13 



( 0 ) ( 0 ) 
( 1 ) ( 0 ) 



leUXD 

16 (1)(1) 



= 1 + 16 = 0 (mod 17) 

+ 2 + 32 



+ 



1 13 ( 2 ) ( 1 ) 

2 13 ^ ^ 



16 (0)(1) + (1) 13 (0)(1) 
16 (0)(1) + (2) 13 (1)(1) 



16 (1)(1) 
16 U) (1) 



= 16+16 = 14 (mod 17) 

+ 8 + 416 



= ( 1 ) 13 ( 2 ) 16 (0) (1) + 1 13 (°) ( 2 ) 16 ( 1 )( 1 ) 

+ 2 13 ^ (2) 16 (0)(1) + 2 13 d) (2) 16 (1)C1) 



= 1+16 = 
+ 32 + 512 
= 1 13 (2) (3) 

+ 2 13 (3) (3) 



0 (mod 17) 

16 (0)(1) + 1 13 (0){3) 
16 (0) (1) + 2 13 (1)(3) 



16 

16 



( 1 ) ( 1 ) = 
( 1 ) ( 1 ) 



16 + 16 
+26 + 128 



16 

(mod 1' 
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i.e. , 



H(m,n) = 

two dimensional 
Fermat transform 



6 

0 

15 

0 



0 

14 

0 

16 



The two dimensional transform of 



X = 



2 

15 

0 

0 



1 

0 

0 

0 



will be given by 



X(m,n) = 



p-1 2L-1 
l l x(i,j) a 
j=0 i=0 



lm jn 
2i a p 



So 



X(0,0) =5(0,0) 13<°> (0 > 16«» (0 » + 5(0,1) 13 <0|<0 > 16 (1)(0 > 



+ 5(1,0) 13 (1 > <0 > 16< 0)(0 > 



= 2 + 1 + 15 = 1 (mod 17) 
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x(l,0) 



x( 2,0) 



x(3,0) 



x (0 , 1) 



x (1 , 1) 



- x(0,0) 13 (0)(0) 16 (0)(0) + x(0,l) 13 (0) (1) 16 (1)(0) 



+ x (1, 0) 13 



(1) (1) 16 (0) (0) 



= 2 + 1 + 15(13) = 11 (mod 17 



= x (0 , 0) 13 (0)(2) is 0 ° (°) +x(0/ i) 13 (0) (2) 16 (1)(0) 



+ x (1 , 0) 13 (1) (2) 16 (0) (0) 



= 2 + 1 + (15) (16) = 6 (mod 17) 



= 2 13 (0)(3) 16 (0)(0) + (1) 13 (0)(3) 16 (1)(0) 



+ 15 13 (1) (3) 16 (0)(0) 



2 + 1 + (15)4 = 12 (mod 17) 



= 2 13 (0)(0) 16 (0)(1) + (1) 13 (0)(0) 16 (1)(1) 



+ 15 13 (1 ^°> 16 (0)(1) 



= 2 + 16 + 15 16 (mod 17) 



(2) 13 (0)(1) 16 (0)(1) + (1) 13 (0)(1) 16 (1)(1) 



+ (15) 13 (1)(1) 16 (0)(1) 



= 2 + 16 + 8 = 9 (mod 17) 
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*(2,1) = (2) 13 (0)(2) 16 (0)(1) + (1) 13 (0)(2) 16 (1)(1) 



+ (15) 13 (1)(2) 16 (0)(1) 



= 2 + 16 + (15) (16) = 3 (mod 17) 

x(3,l) = (2) i 3 (°) ( 3 ) i6 (0)(1) + (!) i3(°)( 3 ) 16 (1)(1) 
+ (15) 13 (1) (3) 16 (0) (1) 



= 2 + 16 + (15) (4) = 10 (mod 17) 



So 



X (m,n) 

two dimensional 
Fermat transform 



1 16 

11 9 

5 3 

12 10 



Let y (i, j) be two-dimensional cyclic convolution of 
x and h sequences, then it can be proved that two dimensional 
transform of y is the product of H(m,n) and X(m,n) [17], 
defined by 



p-1 2L-1 

y(iyj) = 2 Pl l l H(m,n)X(m,n)a 2i ini a~ 3n 
n=0 m=0 

i — 0,1, 2L-1 

3 = 0,1, ..., p— 1 
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Coming back to the example : 



y (0/0) = H(0,0)X(0 / 0)13~ (0) (0) 16 _(0) (0 ) +h( 0,1)X(0,1)13" {0) (0) i6~ (0) (0) 
+ H(1,0)X(1,0)13' (0) (1) 16" (0) (0) +H(l,l)X(l,l)13~ (0) (1) 16 _(0) (1) 
+ H(2,0)X(2,0)13 -(0) (2) 16 _(0) (0 } +H (2 , 1) X (2 , 1) 13~ (0 } (1) 16~ (0) (1) 
+ H(3,0)X(3,0)13 -(0) (3) 16~ (0) (0) +H(3,l)X(3,l)13" (0) (0) 16 -(0) (1) 



1 * 6 * / 



y ( 0 •, 0 ) = (6) (1) (1) (1) + (14) (4) (1) (1) + = 10 (mod 17) 



+ (15) (5) (1) (1) + (16) (10) (1) (1) 



id/0) = ( 6)(i)i3- (1 ^°)i6- (0)(0) + (I4)(9)i3- (1 >< 1 >16- (0,(1) 



+ (15) (5) 13" (2) (1) 16" (0) (0) + (16) (10)13 -(1) (3) 16 -(0) (1) 



6 + 126 13 _1 + 75 13~ 2 + 160 13~ 3 



= 16 (mod 17) 



Notice that in this last calculation use of the following 
table has been made: 



13 _1 = ? 13x13 1 = 1 (mod 17) = 1 13 1 = 4 
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So 



13 _1 


= 


4 






also 


16 _1 


= 


16 


13" 2 


= 


4 2 


= 


16 




16- 2 


= 


1 


13" 3 


= 


4 3 


= 


13 




16- 3 


= 


16 


IS" 4 


= 


4 4 


= 


1 




16" 4 


= 


1 


13 -5 


= 


4 5 


= 


4 order 4 




16~ 5 


= 


16 


13' 6 


= 


4 6 


= 


16 




16~ 6 


= 


1 


13 _ 7 


= 


4 7 


= 


13 




16- 7 


= 


16 


I!" 8 


= 


4 8 


= 


1 




16 -8 


= 


1 


13" 9 





4 9 


__ 


4 




16 _9 




16 



) order 2 



Continuing with the example: 



y (2, 0) = (6) (1) 13 _ (2) (0) lg - (0) (0) + ( 14) (9) 13 “ (2) (D 16 “ (0) (1) 



+ (15) (5) 13 _(2)(2) 16 (0) (0) + (16) (10)13 (2) (3) 16~ (0) (1) 



= 16 (mod 17) 

y (3, 0) = (6) (1) 13~ ( 3 ) (°) 16 -(°) (°) + ( 14 ) ( 9 ) 13~ (3) (1) lg - (0) (1) 

+ ( 15 ) ( 5 ) 13 "(3) (2) 16 -(0) (0) + ( 16) (io)13 _(3) ( 3 ) 16 -(0) (1) 

= 16 (mod 17) 
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and for the second column. 



y ( 0 , 1) = (6) (1) 13 (0)(0) 16 (1) (0) + (14) (9) 13 (0)(1) 16' 



+ (15) (5) 13~ (0) (2) 16 -(D (0) + (i 6 ) (10)13 _(0) (3) 16 



= 16 (mod 17) 



y (1 , 1) = (6) (1) 13 (1)(0) 16 (1) (0) + (14) (9) 13 _(1)(1) 16 



+ (15) (5) 13 _(1) (2) 16 _(1) (0) + (16) (10)13 _(1) (3) 16 



= 16 (mod 17) 



y (2 , 1) = (6) (1) 13 “ (2) (0) !6" (1) (0) + (14) (9) 13 (2)(4) 16' 



+ (15) (5) 13" (2) (2) 16” (1) (0) + (16) (10)13~ {2) (3) 16* 



= 10 (mod 17) 



y ( 3 , 1) = (6) (1) 13 -(3)(0) 16 (1) (0) + (14) (9) 13 _(3)(1) 16‘ 
+ (15) (5)13~ (3) (2) 16~ (1) (0) + (16) (10)13~ (3) (3) 16* 



= 16 (mod 17) 



Finally, 



-d) (1) 
-U) (1) 

-d) (1) 
-( 1 ) ( 1 ) 

( 1 ) ( 1 ) 
( 1 ) ( 1 ) 

( 1 ) ( 1 ) 
(1) Cl) 
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10 


16 


y(i/j) = 


l 

2PL 


16 


16 




16 


10 






16 


16 


and since 








1 

2PL 


11 


= 


(2N) _1 


we have 




10 


16 






16 


16 


y(i/j) = 


15 


16- 


10 






16 


16 



= (8 1 ) = 15 (mod 17) 



14 

2 

2 

2 



2 

2 

14 

2 



(mod 17) 



And, since it can be proved that the relationship between 
two dimensional convolution and one dimensional convolution 
is given by [18] : 



y (i+L, j ) = y ( jL+i) 



where 



i = 0 , 1, . . . , L-l 

j = 0,1, . . . , p-1. 
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In this example: 











y (jL+i) 


© 


i = 0, j = 0 


y (0+2 , 0) = y (2 , 0) 




y (0 • 2+0) = y (0 ) 






i.e. y (2,0) = y (0) 


= 2 




© 


i = 1, j= 0 


y (1+2,0) = y (3, 0) 




y (0 • 2+1) = y (1) 






i.e. y (3,0) = y (1) 


= 2 




© 


i = 0, j = 1 


y (0+2 , 1) = y (2, 1) 




y (1 • 2+0) = y (2) 






i.e. y (2,1) = Y(2) 


= 14 




© 


i = 1, j = 1 


y (1+2,1) = y (3, 1) 




y (1 • 2+1) = y ( 3) 






i.e. y (3,1) = y(3) 


= 2 




Note 


that for the 


original sequences 








x (n) 


(0,1,15,2) and 


h(n) 


= (1,2, 0,0) 




y (n) 


= x(n) * h(n) = 


2,2,14 


,2. 



Exactly the same result!! 

In taking two dimensional transforms, p and 2L are 
restricted to be a power of 2: and also p <_ 4b and 2L <_ 4b 
(b = 2 fc bit representation of integers in arithmetic modulo 
F Fermat number) in the example: 
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Since F fc = 2 b + 1 = 17 b = 4 t = 2, 

also p = 2 , L = 2 . 

Notice that N = PL < 8b 2 . 

Thus the length of the sequences that can be convolved 
using two dimensional convolution scheme is proportional 
to the square of the number of bits used in the word length. 

The order in which two dimensional transforms or inverse 
transforms is taken is reversible. But there is some compu- 
tational advantage in taking transform first along the 
direction 2 (length p) and then along the direction 1 (length 
2L ) t because half the x sequences along direction 2 (p) are 
zero and half the n sequences along direction 2 are cyclic 
shifts by one position of the other half n sequences [18] . 

Also while taking the inverse transform, computationally 
it is advantageous to first take the inverse transform 
along direction 1, then along direction 2 [18] . Because 
we need only half the y sequences along direction 2, 
therefore after taking inverse transforms along direction 
1, we can throw away half the terms. 
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APPENDIX B 



BASIC PROPERTIES OF QUADRATIC RESIDUES 



Let 



2 

x = a (mod p) (B.l) 

be a congruence, where p is any odd prime and a is any 
integer. If a 5 0 (mod p) , then the only solution to (B.l) 
is x = 0 (mod p) . Therefore one assumes p | a. For some 
values of a, (B.l) will have a solution, whereas for some 
other values of a, (B.l) will have no solution. 

Definition B.l : Let p be a prime, and let a be any 

integer such that p J( a. One says that a is a quadratic 
residue modulo p provided that 

x^ = a (mod p) (B.2) 

has a solution. Otherwise, one says that a is a quadratic 
nonresidue modulo p . 

Suppose that p is given and consider the problem of 

determining all quadratic residues modulo p. If a is a 

2 

quadratic residue modulo p, then p | a and a = x (mod p) 
for some x. However, since any integer is congruent to one 

of 0,1, p-1 (mod p) , one sees that a must be congruent 

to one of 
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(B. 3) 



1^, 2 2 , . (p-l)^ (mod p) 

-> 

If p is not too large, then this procedure can actually be 
used for computation. 

Example: Let p = 13 

Then a is a quadratic residue modulo 13 if and only if a 
is congruent to one of 

l 2 , 2 2 , 12 2 (mod 13) ; 

that is a is a quadratic residue modulo 13 if and only if 
a = 1,4,9,3,12,10,10,12,3,9,4,1 (mod 13). 

Thus the quadratic residues of 13 are 

1, 3, 4, 9, 10, 12. 

Hence the quadratic nonresidues modulo 13 are 

2, 5, 6, 7, 8, 11. 

Notice that the initial list of quadratic residues 
obtained is symmetric, with each element of the list 
appearing exactly twice. This is a general phenomenon. 
Indeed, one has 

p-x s -x (mod p) (B. 4) 
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so that 



(p-x) 



2 




(B. 5) 



and thus 



2 2 

(p-x) = x (mod p) 



(B . 6 ) 



therefore, if a is a quadratic residue modulo p, then a 
is congruent to one of 



No two integers of (B.7) are congruent modulo p. Hence, 
among the integers 



precisely (p-l)/2 are quadratic residues modulo p and 
precisely (p-l)/2 are quadratic nonresidues modulo p. This 
can be verified in the above example. One finds the 
following notation very convenient: 

Definition B.2 : 

Let p be an odd prime and a an integer such that p )( a. 
The Legendre symbol is defined as follows : 




• • • 9 



p-1.2 , 

£ -y-) mod p 



(B.7) 



1 , 2 



* * * r 



p-i 
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+1 



if a is a quadratic residue modulo p 




(B. 8) 



-1 if a is a quadratic nonresidue modulo p 



The Legendre symbol (^) should not be confused with the 

Jr 

fraction a/p. 



Example : 



Let p = 3 




-1 




1 




1 




-1 



See above example, where it is listed the quadratic residues 
modulo 13, and quadratic nonresidues mod 13. Moreover, 
since 



and 5 is a quadratic nonresidue modulo 13. So is 18, and 
thus 



Properties of Legendre symbol . 

Let p be an odd prime and let a and b be integers such 
that p J( a and p J( b. Then the following results hold 



18 = 5 (mod 13) 



[20J ; 
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1 



(i) 


i— 1 

II 

%,I* 


(ii) 


= 1 (B. 9) 


(iii) 

Proof : 


If a = b mod p then (— ) = (— ) 

P P 


(i) 


2 2 

The congruence x = a mod p has as a 



solution x = a. 



(ii) 


Set a = 1 in result (i) . 


(iii) 


2 

If a = b (mod p) , then the solutions x = a (mod p) 

2 

are the same as the solutions of x E b (mod p) . 



Therefore, the first congruence has solutions 





if and only if the second does. Thus, 




(-) =• (-) 

P P 



The properties of the Legendre symbol given above are very 
elementary. However a property of the symbol which is by 
no means obvious is the following result: 

Theorem 8.3 : (Euler's Criterion): 

Let p be an odd prime and let a be an integer such that 
p | a. Then 
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(mod p) 



(B. 10) 



<|) = a (P-D/2 



Proof : 

By Fermat's theorem (2.1 4J f one has 



( a ( P - i )/ 2 ) 2 



p-1 

a^ 



= 1 



(mod p) 



Thus, if 



h = 



a (p-l)/2 



then 



h 




1 mod p 



and so 



p| (h-1) (h+1) . 



Therefore 



P | h-1. 



or 



p| h +l. 
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and hence 



= a (P -1 )/ 2 = ±1 



(mod p) . 



Now if p is odd, so (J) = +1 if and only if a (P -1 )/ 2 = 1 (mod p) 

Jr 

Consequently, (^) = i 1 if and only if 

ir 



(p-l)/2 E ± x 



(mod p) 



respectively . 



Corollary B.4 : 

Let p be an odd prime, and let a and b be integers such 
that p J( a, p )( b. Then 






(B . 11 ) 



Proof : 

By Euler's Criterion 

<f) = (abl'P- 1 '/ 2 = a (p-D/2 b (p-l)/2 5 ,1,(1, mod p _ 

ir r' ir 

It is an immediate consequence of Corollary (B.4) 

that 

(i) the product of two quadratic residue modulo p is 
a quadratic residue modulo p. 

(ii) the product of two quadratic nonresidue modulo p 
is a quadratic residue modulo p. 
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and 



(iii) the product of a quadratic residue and a 

quadratic nonresidue is a quadratic nonresidue. 

Example: Let p = 13 

By previous calculations 

3 and 12 are quadratic residues mod 13 

and 

2 

3*12 = 36 = 6 (mod 13) is a quadratic residue. 

However 



2 and 5 are quadratic nonresidues mod 13 



and 

2 

2*5 = 10 = 6 (mod 13) is a quadratic residue mod 13. 
Finally 

7 is a quadratic nonresidue mod 13 
10 is a quadratic residue mod 13 
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and 



7*10 = 70 5 (mod 13) is a nonresidue. 

Another consequence of Euler's criterion is the following: 
Corollary 8.5 : 

Let p be an odd prime. Then 

} (P-D/2 

if p 5 1 (mod 4) 

if p = 3 (mod 4) 



& 



= (-1 



In other words , 



<T> ■< 



+1 



-1 



Proof : 

By Euler's criterion. 



(^) = (-1) (P 1)/2 (mod p) 



Therefore, since (“~) 
desired result. 
Notice that 



= ±1 and since p > 2 , we have the 



(_E_) = (ZL) (— ) 

v P P P 



(Zi) = (-1) (P-D/2 

p 
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since (^-) = 1 by (B.9) , therefore, x 2 = -a 2 (mod p) is 
solvable if and only if p = 1 (mod 4). 

Example : 

Let 



2 



x 



= 19 



(mod 23) 



Now 



19 = -4 (mod 23) 



So 



(— ) 
^ 23 ; 



(— — ) = (— ) {——) 2 
k 23’ '23 '23* 



= -1 



2 

Since 23 = 3 (mod 4) . Thus x =19 (mod 23) is not solvable. 
How does one go about computing (^) for p J( a? 

Jr 

Suppose that 




where P t are distinct primes. Since p J( a, one 

sees that p ^ p^. Then by Corollary 4, one has 



Example : 
Let p 



=5 a = -24 

{ ~¥' ) = (= T } ( | } 3 ( | } = 1* (-1) 3 (~1) = 1 
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